How Mathematical Modeling Identified Subclasses of Autism

Photo of three people with trees behind them. — The Li family poses for a photo in 2025. Important research into understanding autism is made possible by the generosity of SPARK participants, such as the Li family, whose de-identified data are made available to researchers. Scott DeFillippo, DeFillippo Photo and Video

Two people who have been diagnosed with autism can have very little in common. That gap — between a shared label and vastly different experiences — has long been one of the field’s biggest unsolved challenges. Factor in a wide range of potential genetic contributors, and the puzzle becomes even more complicated.

In their July Nature Genetics study, researchers at the Flatiron Institute’s Center for Computational Biology (CCB) used advanced mathematical models to analyze data from more than 5,000 research participants with autism. The work revealed four distinct groups that link autism-related traits with underlying genetics. This work could open the door for more precise diagnoses and personalized support in autism interventions.

“This study shows how computational approaches can help connect what we observe clinically to underlying genetics and molecular biology,” says Olga Troyanskaya, a senior research scientist and deputy director for genomics at the CCB who served as the study’s senior author. “Because autism and many other complex human conditions are so heterogeneous, methods that enable more comprehensive, data-driven stratification can reveal biologically distinct subtypes — an important step toward clearer mechanisms and, ultimately, more precise paths to diagnosis and care.”

Portrait of Olga Troyanskaya — Senior Research Scientist and Deputy Director for Genomics Olga Troyanskaya of the Flatiron Institute’s Center for Compu-tational Biology. Princeton University, Sameer A. Khan/Fotobuddy (2020)

A Person-Centered Approach

Neither of the study’s two first authors, Natalie Sauerwald and Aviya Litman, got their start in genetics or neuroscience. They both began their academic journeys in mathematics, completing undergraduate degrees in the subject before embarking on graduate work in genomics.

“Genomics is especially well suited to a mathematical and computational background — it’s a field defined by vast amounts of data that require complex algorithms and analytical frameworks to understand,” says Sauerwald, a CCB associate research scientist.

This is especially true for autism, which presents a unique challenge from a genomics perspective due to its high variability. There isn’t a single “autism gene” — there are many genetic variations (called genotypes) that could play a role. There is also a wide range of observable traits (called phenotypes) associated with autism such as sensory issues or impaired language. Understanding why autism can manifest so differently among individuals is a big focus for the field, and matching genotypes with phenotypes is no easy feat.

“A lot of studies will take a trait-centered approach in which they examine everyone who shares one particular trait, but that doesn’t capture an individual’s complexity,” says Litman, who is a graduate student in Troyanskaya’s lab.

Instead, the team wanted to try a more comprehensive, “person-centered” strategy that accounted for all of an individual’s traits rather than grouping people by just one. Their data, collected from SPARK — a landmark study supported by the Simons Foundation Autism Research Initiative (SFARI) — included genotypic and phenotypic information from more than 5,000 individuals with autism.

“Our goal from the start was to include everything — to preserve the complexity and nuance of the data, revealing what matters and how the pieces fit together,” says Sauerwald. “When you account for that level of complexity all at once, you naturally arrive at a person-centered approach.”

The team built a type of model called a ‘mixture model’ that could integrate many data types collected in many ways. This was critical, as the study’s data took several forms. Some measures were binary — simply asking whether someone had a certain trait or not. Others were grouped into categories, such as levels of language ability. Some were even measured on a continuum, for example the timing of key developmental milestones. The scientists stressed that compiling and analyzing all these data in a meaningful way was only possible thanks to mathematics.

“Mathematics provides a language that allows us to structure that information and extract insight,” says Litman. “Without it, making sense of hundreds of phenotypes and millions of genetic variants would be impossible.”

The Four Groups

Based on their person-centered approach, the team identified four major categories of SPARK participants.

The largest group, Social and Behavioral Challenges, shows co-occurring traits such as ADHD, anxiety, depression, mood dysregulation, and communication and repetitive behavior challenges, but little to no developmental delay. This accounts for about 37 percent of participants.

The Mixed ASD with Developmental Delay group, which made up 19 percent of participants, shows the opposite pattern: notable developmental delays but fewer emotional or behavioral difficulties.

The Moderate Challenges group (34 percent of participants) includes individuals with milder or fewer social and behavioral challenges and no developmental delays.

The smallest group, Broadly Affected (10 percent), experiences widespread challenges across social communication, repetitive behaviors, development and mental health.

Importantly, the scientists say that these are not set-in-stone groupings, but rather a starting point for further research.

“This doesn’t mean that there’s necessarily only four classes,” says Troyanskaya. “I think what this demonstrates is that there are at least four classes.”

Photo of five people with trees behind them. — The Austin family signed up for SPARK after learning that many autism research studies do not include girls of color in their cohorts. De-identified data from SPARK participants have enabled new insights into autism. Scott DeFillippo, DeFillippo Photo and Video

Autism Under the Hood

These four groups were established based on phenotype, and when the scientists started to dig into each group’s underlying biology, they were intrigued to see shared genetic characteristics.

The team started by looking at different genetic variants carried within each group and then traced these variants to the biological processes they affect in the body (such as how neurons fire or how a cell packages its DNA). Whether and how a biological process was affected varied significantly between groups, the team found.

“There was very little overlap,” says Litman. “While we looked at many biological processes previously implicated in autism, each one was largely associated with a different group.”

And it wasn’t just that individuals in each group shared certain genetic variants: The timing of gene activation also aligned.

For instance, in the Social and Behavioral Challenges group, the affected genes were largely active after birth, and individuals in this class showed minimal developmental delays and were diagnosed, on average, at a later age. In contrast, the ASD with Developmental Delays category was associated with genes that were primarily active before birth.

“It’s exciting — and surprising — to see that these biological signatures lined up so precisely,” says Sauerwald. “We didn’t expect they would be quite so distinct, but it speaks to the potential validity of the groups.”

Toward a More Personalized Understanding of Autism

In future work, the team will continue to refine the subtypes, and they hope that their study will encourage researchers and clinicians to think of autism in a more personalized way.

“The main takeaway is that autism should not be treated as a single, uniform condition,” says Sauerwald. “Our results support the idea that different forms of autism may arise from fundamentally different biological mechanisms — an idea we hope will guide future research.”

All of this research will require significant contributions from mathematical and computational tools, and the scientists note that this need will only become more prevalent as the field and the available data grow.

“As scientific data continues to grow in scale and complexity, computation becomes essential. It’s no longer possible to understand even a single genome — let alone thousands — without a mathematical framework to make the data interpretable,” says Sauerwald. “Computational approaches are becoming central across scientific fields, and institutions like the Flatiron Institute play a key role in enabling that work.”