Mutations in Uncharted Territory

Genetic dark matter makes up 98 percent of our genomes, yet scientists are still figuring out how it works.

The genes in your DNA serve as blueprints for all your body’s proteins — the tiny machines that carry out crucial functions such as ferrying oxygen, communicating with other cells and fighting infections. Over the last several decades, researchers have made enormous strides toward understanding how mutations in these genes can result in diseases and disorders. Yet genes in protein-coding regions make up only about 2 percent of the human genome; the remaining 98 percent is largely uncharted territory.

Scientists have known for decades that this ‘noncoding’ DNA has important functions, such as regulating gene expression. Noncoding DNA is responsible, for example, for making sure that the genes essential for brain function are turned on in your brain but turned off in your lungs.

But until the past couple of years, no one knew whether mutations in these parts of the genome play a causal role in any complex human disorder. The noncoding genome is so enormous — encompassing nearly 3 billion nucleotides — that nearly every individual has some mutations there, most of which are harmless. Now, researchers are figuring out ways to sift through the massive amounts of genetic dark matter to figure out which of these mutations are significant. The picture these teams are developing has confirmed some expectations about the role of the noncoding genome, while also opening up unexpected new avenues for exploration.

One of these studies, published in the December 14, 2018, issue of Science, relies on whole-genome sequencing data from the Simons Simplex Collection, a cohort of thousands of families in which one child has autism. Because of the unique design of the cohort, which includes not just the child’s parents but also an unaffected sibling, it’s possible to compare mutations in the children with autism to those in their siblings, a built-in control group.

Overall, the Science study found, children with autism have approximately the same number of spontaneous (or ‘de novo’) mutations as their siblings do, but they have significantly more mutations than their siblings in ‘promoter’ regions of the genome, which appear just before the start of a gene. This was the first time a genome-wide analysis uncovered a role for any type of noncoding mutation in any human condition.

“Being able to show that de novo mutations in noncoding regions contribute to autism is phenomenally exciting,” says Stephan Sanders of the University of California, San Francisco, one of the leaders of the Science study. “It’s our first chance to really come to grips with rare mutations in the other 98 percent of the genome.”
Some of the mutations, the team found, were in promoters for genes involved in neuronal differentiation or developmental delay, as well as genes that interact with CHD8, one of the most common autism risk genes.

THE 2 PERCENT: An electrophoresis gel, a substance used to separate DNA fragments based on size during genetic sequencing. The genes that encode the blueprints for making proteins only make up a tiny fraction of our genomes. Credit: D. Vo Trung

“All of that collectively fits,” says Alan Packer, a senior scientist at the Simons Foundation Autism Research Initiative. “It’s a reassuring sign that they’re on the right track.”

Meanwhile, a second research group has developed a machine-learning algorithm called ExPecto that pinpoints how specific noncoding mutations can disrupt the way genes turn on and off throughout the body. The algorithm assigns a disease impact score to every single nucleotide in the human genome. “It can predict the impact of any variant anywhere, even if it has never been seen in the human population before,” says Olga Troyanskaya, deputy director for genomics at the Simons Foundation’s Flatiron Institute and one of ExPecto’s creators.

Using this method, Troyanskaya and her collaborators at the Flatiron Institute and Princeton University computed the impact of more than 140 million mutations in different bodily tissues. In a study published in the July 16, 2018, Nature Genetics, the researchers identified mutations potentially responsible for increasing the risk of several immune-related diseases, including chronic hepatitis B virus infection and Crohn’s disease. Study co-author Chandra Theesfeld of Princeton University then experimentally verified the results, finding that ExPecto’s predicted mutations were more promising potential contributors to the diseases than those proposed by previous studies.

Troyanskaya and colleagues next examined the impact of noncoding mutations in the Simons Simplex Collection. In a paper in the May 27, 2019, Nature Genetics, the team found that noncoding mutations in children with autism had significantly higher impact scores than those of their siblings. The researchers then inserted more than 50 of the highest-scoring mutations into cultured cells, where they found that the mutations did indeed affect the levels of different autism-associated proteins that appeared within the cells.

While these studies provide only initial glimpses of the role of noncoding DNA in human disorders, they are enough to provide avenues for further research and even some surprises. For instance, in the Science study, the promoter regions that took the hardest hit in individuals with autism were not regions close to the gene, but ones about 1,000 base pairs away. “It’s an interesting twist, and a slight mystery,” Sanders says. The explanation, he says, may be that these promoter regions act like enhancers — noncoding regions that increase the expression of a gene.

And while many of the noncoding mutations that cause disease probably do so by increasing or decreasing the transcription of a gene, the 2019 Nature Genetics study suggests that a second mechanism is also often at play, at least in autism: Some of the mutations, instead of affecting transcription levels, affect how messenger RNA gets processed after it has transcribed a gene. “Perhaps the scope of these latter kinds of mutations has been underestimated,” Packer says.

Ultimately, the 2019 Nature Genetics paper suggests, mutations in the noncoding genome may explain just as many cases of autism as mutations in genes do.
“These noncoding mutations are not just some esoteric thing that some kids might have,” Troyanskaya says. “They are quite likely just as important a player in autism as the coding mutations are.”

Jian Zhou, a Flatiron research fellow in Troyanskaya’s group, anticipates that ExPecto will be particularly useful for studying the evolutionary consequences of mutations. The researchers found, for instance, that mutations were less likely to affect genes expressed throughout the human body than genes specialized for one specific tissue type. “We don’t have a full explanation yet,” he says, but one possibility is that a mutation in a body-wide gene might have a higher likelihood of being fatal or otherwise preventing an individual from passing on their genetic information. “Evolution has already done the experiments for us,” Zhou says.

These three studies are just the first steps in the attempt to unlock the vast amounts of information hidden away in the noncoding portions of the genome. Troyanskaya hopes that eventually these approaches will improve how genetic data are used for diagnosing and treating diseases and disorders. “Right now, 98 percent of the genome is usually being thrown away,” she says. “Our work allows you to think about what we can do with the 98 percent.”