Organizers:
Greg Bryan, Columbia University
Speakers
Shy Genel, Simons Foundation, Center for Computational Astrophysics
Chang-Goo Kim, Princeton University
Matthew Ho, Columbia University
Sophie Koudmani, University of Cambridge
Elisabeth Krause, University of Arizona
Stuart McAlpine, Stockholm University
Shivam Pandey, Johns Hopkins University
Laurence Perreault-Levasseur, Université de Montréal
Laura Sommovigo, Simons Foundation, Center for Computational Astrophysics
Ullrich Steinwandel, Max Planck Institute for Astrophysics
Past Meetings:
-
The Simons Collaboration on Learning the Universe (LtU) held its fourth annual meeting on September 18–19, 2025. About seventy members of the collaboration and thirty external scientists from major cosmological surveys and simulation groups participated. The meeting showcased scientific advances across LtU’s working groups and offered an opportunity to build stronger connections with the broader cosmology community.
Scientific Program
The first day opened with a presentation by Greg Bryan and Laurence Perreault-Levasseur, who described how the collaboration is developing fast, physically interpretable forward models to connect theory and observation. By combining emulators, generative models, and causal discovery methods, their team is constructing a framework capable of bridging traditional simulations and upcoming large-scale survey data at dramatically reduced computational cost.
The theme of bridging simulations and observations continued with a talk by Laura Sommovigo and Shivam Pandey. They presented new pipelines for generating synthetic galaxy and CMB surveys, including SDSS-like mock catalogs and CMB maps created with both field-level and halo-based methods. A particular focus was the treatment of dust attenuation and its cosmological implications, highlighting how realistic mocks can be used both to validate methodologies and to extract astrophysical information from upcoming datasets.
Matthew Ho then presented results from the collaboration’s flagship simulation-based inference analysis of the BOSS CMASS galaxy sample. By integrating emulators with high-resolution simulations, his team was able to achieve precise cosmological constraints and identify methodological lessons that will inform the analysis of future surveys. The presentation emphasized both the promise and the practical challenges of field-level inference, including validation, scaling, and data management.
Attention then turned to the modeling of baryonic processes. Chang-Goo Kim and Ulrich Steinwandel described progress on physically motivated subgrid models of star formation and galactic winds, as well as pathfinder studies that combine BORG-generated initial conditions with cosmological simulations. Their work highlighted how small-scale physics can be systematically connected to cosmological environments, laying the foundation for next-generation flagship simulations.
The final talk of the first day was given by Stuart McAlpine, who described the BORG group’s Manticore project. This effort has produced the most detailed reconstructions of the local Universe to date, combining survey data with simulations in a Bayesian framework. The resulting “digital twin” provides dynamically consistent maps of matter and velocity across hundreds of megaparsecs, offering new insights into the growth of structure and the environments of nearby clusters.
The second day began with an invited presentation from Elisabeth Krause, who surveyed the landscape of current and upcoming cosmological surveys. She highlighted both the opportunities of combining probes such as clustering and weak lensing, and the challenges of systematics, modeling, and inference at survey scale. Her perspective underscored the importance of LtU’s efforts to connect simulations and data analysis.
Sophie Koudmani followed with a talk on the black hole–galaxy connection. She described new theoretical models and numerical methods, including cyclic zoom simulations and machine-learning–accelerated black hole models, that shed light on the growth and feedback of supermassive black holes. These efforts are particularly timely in light of JWST observations of unexpectedly luminous early quasars.
The scientific program concluded with a talk by Shy Genel on the future of training cosmological emulators. He outlined ongoing efforts to construct emulators based on the new CAMELS simulations, which span broad ranges of astrophysical and cosmological parameters. These emulators will enable robust, survey-scale modeling and represent an important step toward integrating LtU’s methods into observational pipelines.
External Feedback and Next Steps
The meeting closed with a debrief involving external scientists. Discussion emphasized the urgency of engaging more directly with survey collaborations, through embedded members, memoranda of understanding, and participation in survey working groups. Participants encouraged LtU to identify a small number of flagship goals, to enhance integration across working groups, and to communicate its main scientific achievements more clearly. A recurring suggestion was to organize community-wide mock data challenges, which could both benchmark LtU methodologies and engage external researchers.
Conclusion
The 2025 meeting demonstrated the breadth and maturity of LtU’s scientific program. From small-scale models of star formation and black hole feedback to large-scale emulators and field-level inference, the collaboration is assembling the pieces of an end-to-end framework that links physical theory with survey data. The next steps will be to embed these methods within survey collaborations and to sharpen the focus on flagship projects, ensuring that the collaboration plays a central role in the analyses that will define cosmology in the decade ahead.
-
The Learning the Universe collaboration is developing new tools to jointly infer the fundamental physical processes driving both cosmological structure formation and galaxy evolution, leveraging data from current and upcoming galaxy redshift and CMB surveys. Over our first four years, we have made significant progress toward these goals.
The primary aim of this collaboration meeting is to strengthen our connections with ongoing and planned surveys and to engage with the broader community that could benefit from our tools. We also hope to gain critical feedback that will help us understand the limitations of our current approaches and identify opportunities for improvement.
Our recent advances include the development of tools for rapid simulation emulation, the effective application of implicit inference techniques, and the generation of mock galaxy and CMB observables. We have also carried out a reconstruction of cosmological initial conditions across an unprecedented volume. On the galaxy formation side, we’ve developed a novel, first-principles model for self-regulated star formation, introduced an innovative subgrid model for galactic winds, and improved the physical realism of black hole seeding, accretion, and feedback. These developments are being implemented and tested in both differentiable semi-analytic frameworks and cutting-edge hydrodynamics codes, and are forming the basis of new training simulation suites for inference.
During this meeting, we will review these developments and discuss how they can be deployed in practice—and refined—with input and collaboration from the broader community. We are excited to work together to advance the field and deepen our understanding of the universe.
-
Thursday, September 18, 2025
9:30 AM Laurence Perreault-Levasseur
Learning how to Accelerate the Universe11:00 AM Laura Sommovigo & Shivam Pandey
From Simulations to Surveys: Building Realistic Mock Catalogs for Galaxies and the CMB1:00 PM Matthew Ho
A simulation based re-analysis of the CMASS data2:30 PM Chang-Goo Kim & Ulrich Steinwandel
Bridging Galactic and Cosmological Scales: Baryonic Modeling for Next-Generation Simulations4:00 PM Stuart McAlpine
Creating a Bayesian digital twin of our UniverseFriday, September 19, 2025
9:30 AM Elisabeth Krause
The landscape of current and coming cosmological surveys11:00 AM Sophie Khoudmani
Modeling the Black Hole-Galaxy Connection1:00 PM Shy Genel
Future Directions in Training Cosmological Emulators -
Shy Genel
Center for Computational AstrophysicsFuture Directions in Training Cosmological Emulators
View Slides (PDF)The Learning the Universe (LtU) approach demands robust, physics-motivated modeling of baryons across very large cosmological volumes — regimes that cannot, in the foreseeable future, be directly simulated with forward models such as semi-analytical models or full hydrodynamical simulations. This challenge drives the development of accelerated models, or emulators, trained on high-fidelity simulations, which in turn requires generating extensive training sets. In this talk, I will present recent progress, ongoing efforts, and future directions within LtU toward building this pipeline. In particular, I will highlight the new generation of CAMELS simulations, which span 50 Mpc/h boxes while sampling a broad cosmological and astrophysical parameter space, and describe current efforts to construct cosmological emulators based on them. I will also discuss remaining challenges and outline how interdisciplinary collaboration across LtU working groups can accelerate progress toward our shared goals.
Chang-Goo Kim
Princeton UniversityUlrich Steinwandel
Max Planck Institute for AstrophysicsBridging Galactic and Cosmological Scales: Baryonic Modeling for Next-Generation Simulations
View Slides (PDF)The SF/ISM/GW working group is responsible for developing physically principled subgrid models of star formation, the interstellar medium, and galactic winds — essential ingredients for describing baryonic cycles. To this end, we employ a wide range of simulations, from local galactic patches to whole galaxies, where the ISM physics can be treated explicitly at varying levels of complexity (from pure hydrodynamics to MHD with radiation, non-equilibrium chemistry, and cosmic rays). In this talk, we highlight two of our recent activities among many ongoing efforts: (1) the implementation of a new star formation model based on pressure-regulated, feedback-modulated theory for very low-resolution cosmological simulations, and (2) CRMHD simulations of local galactic patches aimed at recalibrating multiphase wind loading factors.
The Cosmology working group is paving the way for the next generation of large-scale simulations, in close collaboration with the Star Formation, Black Hole, and BORG groups. Our efforts center on two main pillars aimed at advancing the LtU flagship simulations. First, we are combining BORG’s expertise in initial condition generation with state-of-the-art cosmological simulations to study the formation of large-scale structure in the local Universe and to better understand the baryon cycle in these environments. We will present results from our first pathfinder study and outline directions for improvement. Second, we are adapting multi-zoom techniques to simulate a pre-selected halo mass function, with the goal of achieving both completeness and computational efficiency.
Matthew Ho
Columbia UniversitySimulation-Based Inference with BOSS: Cosmology Results, Lessons Learned, and Future Directions
View Slides (PDF)This talk will provide a comprehensive overview of our flagship Learning the Universe simulation-based inference analysis of the SDSS BOSS CMASS galaxy sample. By combining high-resolution physical models with fast emulators, we generate the diverse training sets required for field-level inference, enabling us to constrain cosmological models with unprecedented precision. I will report our main cosmological results in the context of previous classical and ML-guided analysis of spectroscopic galaxy clustering. I will also discuss the lessons learned from this process, including key insights gained into galaxy formation models, scaling of statistical constraining power, data management requirements, designs of optimal summary statistics, and the advantages of hyperparameter tuning. This pilot study has also highlighted challenges in the ML workflow related to computational expense, model validation, and data handling. I will conclude by demonstrating our forward-looking strategy, describing how the lessons from this work will prepare our pipeline for upcoming cosmological surveys.
Sophie Koudmani
University of CambridgeModeling the Black Hole-Galaxy Connection
View Slides (PDF)Most galaxies, including our own Milky Way, host supermassive black holes at their centres. These black holes release powerful jets, winds, and radiation as they consume surrounding gas, disrupting star formation and shaping not only the history of their host galaxies but also the distribution of matter on cosmological scales. Recently, the James Webb Space Telescope revealed more luminous black holes in the early Universe than anticipated, exposing a major gap in our understanding. Over the past year, we have introduced new theoretical models to probe the origin of supermassive black holes. We have also advanced techniques for modelling their dynamics, growth histories and the feedback processes that are central to assessing the broader cosmological implications within the Learning the Universe collaboration. In parallel, we have created novel numerical tools to drive the state of the art forward, including cyclic zoom-in simulations and machine-learning-accelerated black hole models. Finally, we have designed methods to trace in detail the evolution of black-hole-driven winds and their observational signatures, with the aim of unravelling the black hole–galaxy connection.
Elisabeth Krause
University of ArizonaThe Landscape of Current and Coming Cosmological Surveys
View Slides (PDF)The coming decade of cosmological surveys will deliver data of unprecedented depth and precision, creating new opportunities to probe the physics of cosmic acceleration and structure formation. In this talk, I will first summarize recent progress in combining multiple probes of large-scale structure, such as galaxy clustering and weak lensing, to place robust constraints on cosmology. I will then turn to the obstacles that limit our ability to fully exploit these data, including challenges in connecting observations to simulations, modeling astrophysical systematics, and developing scalable statistical inference methods. Finally, I will outline promising directions — drawing on advances in simulations, statistical modeling, and machine learning — that could help overcome these barriers and bring us closer to building a framework that uses all of the available cosmological information.
Stuart McAlpine
Stockholm UniversityCreating a Bayesian Digital Twin of Our Universe
View Slides (Offsite link)Over the past three years, the BORG working group has advanced the frontier of field-level cosmological inference, culminating in the Manticore project: the most detailed physical reconstructions of the Local Universe to date. By combining galaxy survey data with high-fidelity simulations in a Bayesian framework, we have been able to infer dynamically consistent maps of matter and velocity across hundreds of megaparsecs, providing new insights into the growth of cosmic structure and the properties of nearby clusters such as Virgo and Coma. In this talk, I will highlight the key achievements of the group, including our breakthroughs in modeling, validation, and posterior predictive testing, as well as outline the next phase of our program, which aims to extend these reconstructions to deeper surveys, broaden the physical models, and push towards a fully realized digital twin of the Universe.
Laurence Perreault-Levasseur
Université de MontréalLearning How to Acceleration the Universe
View Slides (PDF)The Learning the Universe collaboration aims to connect first-principles theory with observations by building fast, accurate, and interpretable forward models. Our goal is to create the computational foundations needed to unlock the full scientific potential of next-generation cosmological surveys. Traditional simulations, though powerful, are too computationally expensive to run at survey scale. To address this, we are developing machine-learning–accelerated forward models that preserve physical realism while dramatically reducing cost. This includes emulators for matter clustering, halos, and galaxies, as well as score-based generative models that map dark-matter–only simulations onto hydrodynamical predictions, producing realistic galaxy fields far more efficiently than standard approaches. We are also applying causal discovery methods to disentangle the interplay between galaxies and supermassive black holes, demonstrating how data-driven techniques can complement accelerated forward models. Together, these advances show how the collaboration is building a new generation of forward models that bridge simulations and observations in the era of large surveys.
Laura Sommovigo
Center for Computational AstrophysicsShivam Pandey
Johns Hopkins UniversityFrom Simulations to Surveys: Building Realistic Mock Catalogs for Galaxies and the CMB
View Sommovigo Slides (PDF)
View Pandey Slides (PDF)A central challenge in cosmology is connecting first-principles simulations with the observations that guide our understanding of the Universe. In the first phase of the Learning the Universe collaboration, we have built pipelines to generate realistic synthetic data sets, including mock galaxy surveys modeled on SDSS. We will present our end-to-end mock survey framework, which produces SDSS-like light cones with predicted galaxy photometry. A key focus has been modeling how interstellar dust alters galactic light, combining radiative transfer simulations with simplified prescriptions that balance realism and efficiency. We will also highlight recent work exploring the interplay between dust parameters and cosmological and astrophysical parameters within our Simulation-Based Inference framework. In addition to galaxy observations, the scattering and lensing of CMB provides highly complementary cosmological and astrophysical information to galaxy surveys; to extract this, we need reliable CMB mock maps. We will show how we create such catalogs using various techniques (e.g., both field-level and halo-based pasting methods) as well as developing and testing analytical prescriptions for joint modeling of baryons and galaxies. These will then be used to obtain astrophysical and cosmological constraints with CMB observations from ACT by going beyond traditional 2pt statistics.