Inventing Tools for Others: Supercharging Studies of Autism Genetics

To speed discovery, SFARI decided to break the traditional approach in which each laboratory collects its own datasets and keeps them close to the vest.

In the early 2000s, a consensus began to emerge that autism is not a single condition but rather a diverse one with hundreds of different subtypes and underlying genes. Given this complexity, the Simons Foundation Autism Research Initiative (SFARI) concluded that the traditional approach to scientific discovery — in which each laboratory collects its own datasets and keeps them close to the vest — lacked sufficient power to map autism. Instead, the field required vast, shared datasets that would allow many different research groups to help fill in the picture of autism.

SFARI took on the task of creating such datasets. Crucially, it made an early decision to administer the datasets itself rather than depend on a government entity or an external group of investigators. Over the years, this direct stewardship has allowed SFARI to focus on quality and privacy while simultaneously remaining flexible enough to meet the ever-expanding needs of autism researchers. “Many people participate in making sure the datasets are the highest quality we can reasonably make them before they are distributed to the community,” says Alex Lash, the foundation’s chief informatics officer.

Today, a dedicated informatics group within the foundation distributes data from four different autism-related cohorts spanning thousands of families. These datasets have given rise to more than 200 published papers about autism spectrum disorders. “The SFARI data collections have been transformative and really unparalleled in their utility,” says Brian O’Roak, a geneticist at Oregon Health and Science University.

Researchers can request access to the datasets through an online portal called SFARI Base, and once their requests are approved, the informatics group helps them get what they need from nearly 1 petabyte of data. SFARI has also invested in building a variety of online data visualization tools that enable scientists to search for gene variants, explore behavioral and medical histories, and filter data according to a wide array of options.

When their studies are completed, researchers send their own data to SFARI in turn. “The database gets constantly enriched with results from all the research it has generated,” says Marta Benedetti, a senior scientist at SFARI.


SFARI’s first dataset, the Simons Simplex Collection (SSC), which launched in 2006, offers researchers detailed phenotypic and genotypic information and biospecimens from more than 2,600 ‘simplex’ families, which consist of one affected child and unaffected parents and siblings.

“Leaders at SFARI were really trailblazers in recognizing that this would be a powerful approach,” O’Roak says. “If they hadn’t been forward-looking in recruiting this cohort, we wouldn’t have been able to make the tremendous breakthrough in identifying about 100 autism risk genes since then.”

The collection was “perfectly poised, perfectly timed, perfectly designed to answer a lot of key questions,” says Raphael Bernier, an autism researcher at the University of Washington.

In creating the SSC, SFARI put enormous effort into making sure that the many SSC clinical sites used the same protocols and diagnostic criteria so that the data would be rigorous and clean. “I think this has really set the standard for how to do high-quality data collection in a study like this,” says Casey White-Lehman, the SSC’s senior project manager.

Because of this attention to detail and the depth of the collection, the SSC is “a gift that keeps giving,” Benedetti says. “There’s so much that has come out and so much that can still be mined from this dataset.”

The SSC not only provided resources for researchers already studying autism but also lured new researchers into the field, launching a new generation of autism investigators. “My research career has been made on the back of the SSC,” says Stephan Sanders, a geneticist at the University of California, San Francisco. “When I look at the papers I’ve published, all the ones with the biggest impact have been as a result of the SSC.”

SFARI MEET-UP: Gerald Fischbach at the Simons Foundation Autism Research Initiative converse at the organization’s 2010 annual meeting in Washington, D.C.


As dozens of autism risk genes emerged from studies of the SSC, SFARI came to recognize that a second dataset was now feasible, one that inverted the principle the SSC was based on: Instead of recruiting people with autism and looking for mutations, recruit people with mutations linked to autism and study their symptoms. Simons Searchlight (previously known as the Simons Variation in Individuals Project, or Simons VIP) began in 2010 by collecting data from individuals with a particular autism-linked genetic variant. It now covers more than 50 different genes.

In addition to enabling researchers to take deep dives into specific subtypes of autism, the collection allows families with these subtypes to learn more about their disorder, from researchers and from each other. “In terms of actually making a difference right now in the lives of people with autism, VIP has been critical,” Sanders says. “For developing and supporting family groups, it’s the only game in town.”

Providing these resources for families is a small way of repaying them for the resources they have given to SFARI, White-Lehman says. “Without them we wouldn’t have any of these cohorts,” she says. “Some have been engaged with us for a decade, and we’re humbled and grateful that they’re willing to share their time with us.”

In recent years, SFARI has also supported and managed data distribution for the Autism Inpatient Collection, a cohort of people whose autism is so severe that they need to be hospitalized for long stretches of time. The collection, which is also funded by the Nancy Lurie Marks Family Foundation, offers phenotypic data from more than 528 individuals through SFARI Base and is currently sequencing the exomes — the protein-coding regions of the genome — for these individuals and their families.

“The idea is to study a population that is underserved and undercharacterized,” Benedetti says. “Less is known about this portion of the autism community because the burden on the families is already so much that it’s harder for them to participate in studies.”


In 2016, SFARI launched perhaps its most ambitious dataset yet: Simons Foundation Powering Autism Research (SPARK), the largest study of autism to date, which aims to identify the causes of autism and accelerate research to advance our understanding of autism and improve lives. Already the project has enrolled more than 116,000 families.

By the end of 2019, the project hopes to have sequenced the exomes of about 40,000 people, a process that could lead to the discovery of hundreds of new autism risk genes. “When you have this order of magnitude of information, you can move so much faster, and so much more rigorously,” says Wendy Chung, SPARK’s principal investigator.

Like the SSC before it, SPARK is attracting new researchers into the autism field, Chung says. “It’s like having very nice flowers for the bees.”

Of equal importance to SPARK’s sequencing studies is the project’s research match program, which helps researchers find individuals in the massive SPARK cohort who are eligible for follow-up studies. “The response from the cohort has been fantastic,” Benedetti says.

IDEAS WORTH SPREADING: During her 2014 TED talk, SFARI director of research Wendy Chung presented what experts know and don’t know yet about autism. WENDY CHUNG TED TALK: James Duncan Davidson

For instance, the program recently enabled Jacob Michaelson, an autism researcher at the University of Iowa, and his collaborators to contact about 5,000 families for a survey on autism-related symptoms such as sleep disruption, eating disorders and gastrointestinal problems.

“Working on our own, I’d probably be at the end of my career before I’d be able to collect this much data,” Michaelson says. The research match program is a “game-changer,” he says. “It’s the kind of infrastructure no lab could hope to have on its own.”

O’Roak’s laboratory had a similar experience when it set out to study the genetic underpinnings of autism in identical twins. “We had been trying in vain for a couple of years to recruit through our national collaboration networks,” he says. “But within a few weeks of doing research match, we already had 100 families who expressed interest.”

Michaelson now has research match in mind as he thinks of future projects. “We’re thoroughly converted to this way of doing research,” he says. “I think SPARK research match is going to absolutely transform autism research.”

This article is part of the “Five Unique Contributions We’ve Made to Science” section of the foundation’s 25th anniversary book.