Annual Report

Genes by the Numbers

Illustration:

Gene set enrichment identifies classes of genes that may have an association with disease phenotypes. In this figure, the enrichment of genes within de novo copy number variants (dnCNVs) is shown by the size and color of the circles: Large red circles represent a high degree of enrichment, while the small blue ones represent a modest degree of enrichment. Small de novo deletions in the Simons Simplex Collection (SSC) and Autism Genome Project (AGP) show consistent enrichment for de novo loss of function (dnLoF) and de novo missense (dnMissense) mutations across three cohorts: the SSC, the Autism Sequencing Consortium (ASC) and Deciphering Developmental Disorders (DDD). This type of analysis helped to reveal 65 genes associated with autism spectrum disorder. Adapted from S. Sanders et al. Neuron 87, 1215-1233 (2015).


Over the past six years, gene sequencing studies have definitively linked spontaneous (or de novo) mutations to autism in children who are the only members of their immediate family to have the condition. Studies of the Simons Simplex Collection (SSC) — a repository of data from families with just one child with autism, and unaffected parents and siblings — have consistently shown that the children with autism have more de novo mutations than their siblings.

In 2014, a landmark sequencing study of the protein-coding regions of the genomes of 2,515 families from the SSC identified about 400 loss of function mutations — ones that indisputably disrupt the function of a gene — in some of the children with autism. Yet that doesn't mean that all 400 mutations are in fact connected to autism; about 200 of the children's siblings also have de novo loss of function mutations. Their mutations, though, are presumably benign, at least with respect
to autism.

Assuming, as researchers believe, that the children with autism have benign mutations at about the same rate as their siblings, these numbers suggest that only about half of the 400 loss of function mutations in the children with autism are true autism risk genes, and the other half are red herrings. But which genes fall into which category?

To tackle this question, researchers supported by the Simons Foundation Autism Research Initiative (SFARI) have been developing an array of mathematical techniques to weigh the evidence for each candidate gene. Already, these approaches have highlighted between 65 and 100 high-confidence autism risk genes. "Five years ago, we had just a handful of genes that we knew contributed to autism risk," says Alan Packer, senior scientist at SFARI. "Now this longer list is acting as a catalyst to the field, and is leading to important insights into the biology of the disorder."

The impact of these new mathematical methods goes beyond autism, says Stephan Sanders of the University of California, San Francisco, who has been involved in many of the sequencing studies and statistical analyses. The creation of these tools has inspired many other studies, he says, on disorders such as congenital heart disease, infant epilepsy, intellectual disability and other diseases whose genetics follow a similar logic to that of autism. "Autism has really been leading the field in putting these methods out there," he says.

Genetic Birthdays

The mathematical techniques being developed vary considerably, but all are based on one underlying principle: Genetic variations most likely to be related to autism are the unusual ones — ones you wouldn't expect to see in the general population.

This principle was first applied to assess the evidence that comes from 'recurrent' copy number variants and mutations: Ones that appear in multiple children with autism. Recurrences should be much rarer among the 200 red herring genes than the 200 true autism genes, as the red herrings are presumably sprinkled fairly randomly among all the approximately 18,000 genes in the genome, while the 200 autism genes are crowded into the much smaller set of genes involved in autism — about 500 to 1,000, researchers estimate. It's as if you're throwing 200 darts at a dartboard with 18,000 sections, and another 200 darts at a portion of the dartboard with only 1,000 sections: The latter darts are much more likely to hit the same section more than once.

In its simplest manifestation, the dartboard's mathematics is the same as that of the famous birthday problem in probability theory, which asks how likely it is that a group of people will contain at least two with the same birthday: The sections of the dartboard correspond to possible birthdays, and the darts correspond to the people. In reality, though, the picture is more complex. The sections of the genetic dartboard aren't all the same size: Certain genes are more likely to get hit by a dart (that is, have a mutation) than other genes. For example, long genes tend to have more mutations than short genes, just because there are more opportunities to miscopy something. And genes with a high proportion of C-G base pairs are more vulnerable to being miscopied than those with many A-T base pairs.

Sanders is part of a team — with Xin He of the University of Chicago, Kathryn Roeder of Carnegie Mellon University in Pittsburgh, Bernie Devlin of the University of Pittsburgh, and Matthew State of the University of California, San Francisco — that has developed a method called the 'transmission and de novo association test' (TADA) that calculates the size of each gene's section of the dartboard. TADA can analyze recurrences involving mutations whose link to autism is less clear than that of de novo loss of function mutations: for instance, mutations that are transmitted from parents to children, and 'missense' mutations, in which a single base pair has been miscopied in a way that has not yet been proved to disrupt the working of the gene. In a September 23, 2015, paper in Neuron, the team used TADA to identify 65 genes strongly associated with autism.

Intolerant Genes

Recurrences, in which the same gene is mutated in more than one child with autism, are the low-hanging fruit of sequencing studies. But the vast majority of the genes these studies have uncovered are mutated in only one child. Even in these cases, however, it's possible to assess which mutations are most surprising, and thereby prioritize them as candidate autism risk genes.

Michael Wigler and Ivan Iossifov, geneticists at Cold Spring Harbor Laboratory, and David Goldstein, a geneticist at Columbia University, have independently proposed that the key to doing this lies in understanding each gene's mutation tolerance: the extent to which a hit to that gene impairs an individual's ability to survive and reproduce. Because individuals with autism tend to have lower reproduction rates than the general population, these researchers have posited that autism risk genes should be among the less tolerant genes in the genome.

One simple way to measure a gene's tolerance is to observe how many loss of function mutations appear in that gene in the general population, compared to the gene's length. If loss of function mutations appear often, that means mutations to the gene are largely survivable; if, instead, loss of function mutations are seldom or never seen, that suggests that individuals with mutations to that gene were
unable to survive.

Using this tolerance measure, Iossifov compared the loss of function mutations in children with autism with the loss of function mutations in unaffected controls. Sure enough, Iossifov and Wigler's team reported in the September 23, 2015, issue of the Proceedings of the National Academy of Sciences that the mutated genes in the children with autism had a lower tolerance to mutation, on average, than the genes that were mutated in controls. Using the genes' tolerance scores, Iossifov and Wigler have identified 239 genes that have at least an 80 percent likelihood of being true autism risk genes.

Meanwhile, Goldstein has developed a different scoring system for genetic tolerance. His system measures a gene's underlying mutability — how easily it breaks — based on physical principles, and then compares that mutability with how many loss of function mutations appear in the general population. In similar logic to Wigler's approach, if a gene has many fewer mutations than its underlying mutability predicts, then presumably the people with mutations in that gene did not survive to form part of the general population.

As Goldstein and his colleague Slavé Petrovski of the University of Melbourne described in a paper in the September 2, 2015, PLOS Genetics, this scoring system offers the potential to shed light on intolerance not only in the 2 percent of the genome consisting of genes, but also in the other 98 percent of the genome, much of which performs important regulatory functions. (Goldstein cautions that extending the scoring method to the whole genome will require a much larger dataset of whole-genome sequences from healthy individuals than
currently exists.)

Eventually, this scoring system could help to prioritize mutations that emerge from whole-genome sequencing of the SSC, now under way with support from SFARI and the National Institutes of Health. The New York Genome Center is on track to complete whole-genome sequencing of the entire collection within about another year.

Autism stemming from mutations in the noncoding regions of the genome "virtually has to be happening," Goldstein says. "We want to track those mutations down."

More in

Simons Foundation
Autism Research Initiative

SFARI Research Roundup

Read More

LENA Language System

Read More

Spectrum Launches

Read More