A Data Infrastructure for Autism Discovery

The Simons Foundation for Autism Research has developed vast, shared datasets that allow many different research groups to help fill in the picture of autism.

A visualization of nine genes — DPP6, ITSN1, BRSK2, etc. — recently discovered to be linked to autism spectrum disorder based on pilot data from SPARK. Genes are colored based on their function, such as potassium ion transport, cell movement and steroid-mediated signaling, and are bundled according to associations between the genes. Source: P. Feliciano et al./bioRxiv.org 2019

About 15 years ago, a consensus began emerging that autism is not a single condition but rather a diverse one with hundreds of different subtypes and underlying genes. Given this complexity, the Simons Foundation Autism Research Initiative (SFARI) concluded that the traditional approach to scientific discovery — in which each laboratory collects its own datasets and keeps them close to its vest — would not have sufficient power to map autism. What would be required instead were vast, shared datasets that would allow many different research groups to help fill in the picture of autism.

SFARI took on the task of creating such datasets. Crucially, it made an early decision to administer the datasets itself rather than depend on a governmental entity or an external group of investigators. Over the years, this direct stewardship has allowed SFARI to ensure that the datasets uphold the highest standards of quality and privacy, while simultaneously remaining flexible enough to meet the ever-expanding needs of autism researchers.

Today, a dedicated informatics group within the foundation distributes data from four different autism-related cohorts spanning thousands of families. The Simons Simplex Collection (SSC) is an assemblage of genetic and phenotypic data from more than 2,600 ‘simplex’ families that have one affected child along with unaffected parents and siblings. The Simons Variation in Individuals Project (Simons VIP), recently renamed Searchlight, collects phenotypic data and biological samples from individuals with a mutation in one of more than 50 different autism-linked genes. The Autism Inpatient Collection (the only one of the four datasets that SFARI does not directly manage) is a cohort of individuals whose autism is severe enough to require long hospitalizations. Finally, SFARI’s most ambitious project yet, Simons Foundation Powering Autism Research for Knowledge (SPARK), aims to collect genotypic and phenotypic data from 50,000 families.

