Scientists Begin Building AI for Scientific Discovery Using Tech Behind ChatGPT

A fluid dynamics simulation. The team behind the Polymathic AI initiative plans to train its AI tool on a wide assortment of scientific simulations. Researchers will then be able to use the pre-trained AI as a starting point for their own projects. In a recent paper, members of the Polymathic AI team demonstrated how a broadly pre-trained AI can match or outperform an AI trained specifically for the complex task of replicating the physics of turbulent fluid flow. Credit: Michael McCabe; data from PDEBench

An international team of scientists announced today an ambitious new project that will leverage the same technology behind ChatGPT to build an AI-powered tool for scientific discovery. While ChatGPT deals in words and sentences, the team’s AI will learn from numerical data and physics simulations from across scientific fields to aid scientists in modeling everything from supergiant stars to Earth’s climate.

The team launched the initiative, called Polymathic AI, today alongside the publication of a series of related scientific papers on arXiv.org.

“This will completely change how people use AI and machine learning in science,” says Polymathic AI project lead Shirley Ho, a group leader at the Flatiron Institute’s Center for Computational Astrophysics (CCA) in New York City.

The idea behind Polymathic AI “is similar to how it’s easier to learn a new language when you already know five languages,” Ho says.

Starting with a large, pre-trained model (known as a foundation model) can be both faster and more accurate than building a scientific model from scratch. That can be true even if the training data aren’t obviously relevant to the problem at hand.

“Polymathic AI can show us commonalities and connections between different fields that might have been missed,” says Polymathic AI team member Siavash Golkar, a guest researcher at the CCA. “In previous centuries, some of the most influential scientists were polymaths with a wide-ranging grasp of different fields. This allowed them to see connections that helped them get inspiration for their work. With each scientific domain becoming more and more specialized, it is increasingly challenging to stay at the forefront of multiple fields. I think this is a place where AI can help us by aggregating information from many disciplines.”

The Polymathic AI team includes experts at the Simons Foundation and its Flatiron Institute, New York University, the University of Cambridge, Princeton University and the Lawrence Berkeley National Laboratory. The roster comprises experts in physics, astrophysics, mathematics, artificial intelligence and neuroscience.

Scientists have used AI tools before, but they’ve primarily been purpose-built and trained using relevant data. “Despite rapid progress of machine learning in recent years in various scientific fields, in almost all cases, machine learning solutions are developed for specific use cases and trained on some very specific data,” says Polymathic AI team member François Lanusse, a cosmologist at the French National Center for Scientific Research. “This creates boundaries both within and between disciplines, meaning that scientists using AI for their research do not benefit from information that may exist, but in a different format, or in a different field entirely.”

Polymathic AI’s project will live up to its name by being truly cross-disciplinary. It will learn using data from diverse sources across physics and astrophysics (and, down the line, fields such as chemistry and genomics, its creators say) and apply that multidisciplinary savvy to a wide range of scientific problems. The project will “connect many seemingly disparate subfields into something greater than the sum of their parts,” says Polymathic AI team member Mariel Pettee, a postdoctoral researcher at the Lawrence Berkeley National Laboratory.

“How far we can make these jumps between disciplines is unclear,” Ho says. “That’s what we want to do — to try and make it happen. It’s exciting.”

Creating connections between seemingly disparate scientific disciplines is a reminder “that we have much to gain from looking outside of our subfields and thinking of our research as part of a larger scientific enterprise,” says Polymathic AI team member Michael McCabe, a Ph.D. student in machine learning and scientific computing at the University of Colorado, Boulder. “We hope they’ll enable researchers across broad ranges of science to ask questions that they wouldn’t be able to ask without the models we’re producing.”

ChatGPT has well-known limitations when it comes to accuracy (for instance, the chatbot says 2,023 times 1,234 is 2,497,582 rather than the correct answer of 2,496,382). Polymathic AI’s project will avoid many of those pitfalls, Ho says, by treating numbers as actual numbers, not just characters on the same level as letters and punctuation. The training data will also use real scientific datasets that capture the physics underlying the cosmos.

Transparency and openness are a big part of the project, Ho says. “We want to make everything public,” Ho says. “We want to democratize AI for science in such a way that, in a few years, we’ll be able to serve a pre-trained model to the community that can help improve scientific analyses across a wide variety of problems and domains.”

Information for Press

For more information, please contact Stacey Greenebaum at [email protected].

Links to scientific papers: xVal, MPP and AstroCLIP
Link to high-res image