Teresa Huang: Bridging Science and Machine Learning at Cosmic Scales

Huang explores how symmetries and structure can help reveal new insights into the universe, aiding discovery in everything from cosmology to the development of fusion energy.

Portrait of Teresa Huang taken on the Flatiron Institute roof — Teresa Huang is a Flatiron Research Fellow at the Center for Computational Mathematics, where she researches geometric deep learning and mechanistic understanding of machine learning models. Michael Lisnet/Simons Foundation

Cosmologists studying our universe need new tools to extract insights from simulations of processes such as galaxy formation and star decay. Meanwhile, the developers of machine learning tools need diverse, large-scale datasets to train more powerful models for scientific applications.

Teresa Huang of the Simons Foundation’s Flatiron Institute has brought together cosmologists and machine learning researchers in an ongoing collaboration that benefits both groups. Huang, who holds a doctorate in applied mathematics and statistics from Johns Hopkins University, is a research fellow at the Flatiron Institute’s Center for Computational Mathematics (CCM).

Huang focuses on machine learning on structured data with symmetries and invariance, an area called geometric deep learning. Such structured data are ubiquitous in science, including cosmology, biology, neuroscience and plasma physics.

We recently discussed Huang’s research path, the importance of mentors and her experience in leading interdisciplinary teams. Our conversation has been edited for clarity.

How would you define your work?

I apply machine learning models to structured data such as point clouds and graphs that exhibit interesting symmetries and geometric properties. A point cloud is a collection of data points in space such as a cloud of stars in the sky. A graph is a collection of nodes representing entities and edges representing relations or interactions, for instance, a social network in which nodes are users and edges are their social relations.

The graphs or point clouds I work with are much too complex for humans to grasp all their relationships or patterns, but machine learning tools can uncover these patterns at scale and use them to make predictions or decisions.

You focus on machine learning analysis of geometric data. What inspired that interest?

This comes from my doctoral work at Johns Hopkins, when I was fascinated by the inherent beauty of geometric symmetries. Symmetries are the ways you can transform the structured data without changing its meaning. For example, rotating a cat image still produces a cat image. Humans have an inherent appreciation for symmetry, which is a fundamental concept in mathematical fields like geometry, algebra and probability and other domains including physics, biology and art.

While the beauty of symmetry is something that humans grasp intuitively, machine learning models are not guaranteed to recognize these symmetries, even when trained on a large amount of data. The field of geometric deep learning aims to design machine learning models to respect and exploit symmetries. At Johns Hopkins, I worked mostly with graph neural networks, a machine learning model that respects permutation symmetry in graph data and focused on studying the mathematical properties of these models. “Permutation symmetry” means that the graph remains the same when we permute its node labels.

Interesting geometric properties like symmetry are everywhere in the natural sciences. Yet insofar, the scientific applications of geometric deep learning have mostly been narrowed to domains such as biology, chemistry and social science. At Flatiron, I aim to introduce techniques of geometric deep learning to diverse scientific domains and accelerate scientific discovery. Cosmology ended up being the first domain I explored, thanks to the support of my mentor.

A portrait of Teresa Huang writing on a white board. — Huang received her Ph.D. in applied mathematics and statistics at Johns Hopkins University, where she focused on exploring symmetries in graph learning. Michael Lisnet/Simons Foundation

Tell us more about that.

Before my Ph.D., I earned a master’s degree in data science from New York University. During that time, I had the pleasure of working with Soledad Villar, who became my Ph.D. advisor at Johns Hopkins. Somewhere along the way, Villar met Francisco Villaescusa-Navarro, a cosmologist at Flatiron whom we call Paco.

Paco is an expert in cosmological simulation data who is always looking for new tools to solve open questions in cosmology. Meanwhile, I was developing machine learning tools like graph neural networks that could help with the challenges Paco described. Villar introduced me to Paco, and our collaboration was very natural. This introduction became the seed of the CosmoBench project, which I spearheaded at Flatiron.

What’s the focus of CosmoBench?

Cosmologists have long produced simulations of events like galaxy formation, yielding vast amounts of structured data. But there is no unified benchmark for consolidating different datasets or a machine learning interface to query and extract insights from these datasets.

CosmoBench is the largest cosmology benchmark for machine learning to date, curating from more than two petabytes of data and comprising 34,000 point clouds and 25,000 directed trees. CosmoBench shows that machine learning is competitive with established cosmological methods for challenging tasks on these datasets, highlighting the potential of combining machine learning and physical knowledge at scale.

What was it like to lead the CosmoBench team?

I have led teams before, but everyone on those teams was in machine learning. So this was my first interdisciplinary team. I thought it was important to have a shared vision and provide space for people to share their expertise in a relaxed way.

We started with informal weekly meetings, in which we learned about each other’s fields (particularly the languages) and the different types of problems the cosmologists were trying to solve. This built camaraderie, and it was great that almost everyone was at Flatiron in person. It was easy to have informal chats and organically invite people to the team who had different types of expertise.

Everyone loved the vision of bridging machine learning and cosmology. After a while, though, we needed a concrete project to show how datasets and problems from cosmology can benefit from tools in geometric deep learning.

This is when Lawrence Saul, a senior scientist at Flatiron, suggested we develop a benchmark dataset. This was a great suggestion, and it got us on our way to making CosmoBench. I am also very grateful for everyone working so hard to accomplish this high-quality benchmark in such a short time frame.

Portrait of Teresa Huang working at her laptop — At the Flatiron Institute, Huang uses geometric deep learning to accelerate scientific discovery, notably cosmology. Michael Lisnet/Simons Foundation

What other projects are you working on?

A CCM team led by Andrew Giuliani and Misha Padidar is working on designing stellarators, objects derived from twisted coils that can make clean fusion energy. Stellarators confine superheated plasma into a doughnutlike shape using magnets, generating nuclear fusion. This is much better for the environment than carbon-based energy forms like oil or gasoline. But if plasma leaks out (i.e., plasma particles escape) from a poorly designed stellarator, the device becomes less effective.

Stellarators have intriguing geometric shapes, so it is challenging to optimize them to prevent plasma leakage. If we can solve some challenging mathematical optimization problems fast and cheaply, then we can quickly identify good stellarator designs and build stellarators that produce abundant, clean energy.

Andrew has developed a simulation database of 370 ,000 stellarators with diverse geometric properties. Now we are using machine learning to pluck out the design choices that yield good stellarators. With this, we will have a warm start towards the best-performing stellarators that can be built in practice.

Think of it like designing a car. Building a car totally from scratch is hard. But using models of existing cars — Hondas, BMWs, Teslas — to make something new makes the job easier. We are using machine learning similarly to speed the process of building better stellarators.

Are there any final projects you’d like to discuss?

I am very fortunate to be part of the Open Interval symmetry project at the Simons Foundation, which brings artists and scientists together to develop a project. It’s an ongoing effort. The artist I am working with, Annie Dorsen, specializes in algorithmic theater, using computer programs to transform ordinary materials or raw data into live-generated scripts, music and stage structure. I look forward to seeing what we come up with together.

As a scientist, I always approach a problem by defining the research questions and the expected outcome. Annie showed me that artists resist this logic: Great art emerges not from predetermined forms but attentive engagement with the materials, which in turn reveals the right questions and presentations. This insight is powerful in my scientific research for bridging the gap between theory and practice: Impactful theory emerges not from elegant equations but active experimentation and meticulous attention to detail.