Flatiron Institute

“The idea was to create a research environment where we could take on long-term problems that were drawn from science and treat them with mathematical rigor.”

– Leslie Greengard

In 2013, the Simons Center for Data Analysis (SCDA) was launched at the Simons Foundation with a goal of creating new computational tools to analyze the vast amounts of data being generated and collected in biology. As SCDA thrived, the foundation decided to use the same model in other areas of science. SCDA, later re-named the Center for Computational Biology, or CCB, would eventually come to be the seed project for the much larger-scale Flatiron Institute, a new division of the Simons Foundation, which began operations in 2016 and employs 60 scientists — so far.

Unlike the Simons Foundation’s grants programs that support researchers at academic institutions, Flatiron is an intramural research program, and its working scientists are full-time foundation employees. The institute is a place for researchers and programmers to work closely on computational problems in the basic sciences. “The idea was to create a research environment where we could take on long-term problems that were drawn from science and treat them with mathematical rigor,” says Leslie Greengard, founding director of SCDA and now director of the CCB at the Flatiron Institute.

Currently, the Flatiron Institute comprises the CCB and the Center for Computational Astrophysics (CCA), directed by David Spergel, an astrophysicist at Princeton University. In addition to these centers, Flatiron will soon have a Center for Computational Quantum Mechanics, to be co-directed by Antoine Georges of the Collège de France and Andrew Millis of Columbia University (also currently associate director for physics at the Simons Foundation).

A fourth research center on a yet-to-be-decided subject will complete the institute, filling the last two of its eight research floors. The remaining two floors and roof will be home to a dining hall, a boardroom and a state-of-the-art lecture hall that seats 100 people. The institute also has a data center in the basement with the capacity to power and cool 250 kilowatts of equipment. Architects at New York City-based Perkins Eastman filled 162 Fifth Avenue with glass-walled conference rooms and offices, classrooms, and large, open spaces with comfortable couches. Blackboards encourage collaboration and virtual aquaria encourage meditative thought.

A technique developed by the Flatiron Institute’s Center for Computational Biology and Dennis Shasha of New York University to identify peptidomimetic scaffolds to disrupt protein-protein interactions has future applications in therapeutics.

Proteins use ions to carry out chemical reactions. A technique developed by the Flatiron Institute’s Center for Computational Biology and Dennis Shasha of New York University identifies peptidomimetic scaffolds that can coordinate zinc ions. This work pursues a goal of developing catalytic peptidomimetics.

This models shows a glucagon-like peptide bound to a glucagon-like peptide receptor called GLP-1R. GLP-1R is involved in insulin secretion and is a target for type 2 diabetes therapeutics. The Trauner Lab at New York University has designed a photoswitchable liraglutide (a therapeutic that mimics GLP-1) that permits control of when the peptide binds. The Center for Computational Biology’s P. Douglas Renfrew, Julia Koehler Leman and Daniel Berenberg are working with the Trauner Lab to model the interaction, and potentially to improve it. PDB Code 5VAI.

Research at the Flatiron Institute is supported by the Scientific Computing Core (SCC), co-directed by Nick Carriero, formerly of Yale University, and Ian Fisk, formerly of CERN. The SCC handles the significant computing infrastructure needs for the institute, as well as some of the computation and data-intensive activities of the foundation. The SCC maintains about 4,200 cores and 4 petabytes of storage (a petabyte is 1 million gigabytes) at the institute, with additional cores and storage at satellite facilities located on Long Island and in San Diego. Plans are under way to add more computing cores in the upcoming months.

“Academia has not found a way to provide a comfortable home for the computational sciences in house,” Greengard says. The Flatiron Institute answers that unmet need for high-quality, well-supported software designed for the types of problems that arise in basic science research. The traditional ‘publish-or-perish’ academic model makes long-term computational projects untenable for scientists who are trying to write grants and get tenure. Furthermore, after a program is written, it needs updating and ongoing support. The standard academic model tends to prioritize new productions at the expense of the less glamorous but necessary work of maintaining and improving existing codes. The Flatiron Institute aims to provide a place where scientists and programmers are free to prioritize that computational work. All this is with an eye to advancing science everywhere, and the codes being written will be made available to the broader community as open-source software.

Flatiron is expected to employ 250 scientists and programmers by the time all four centers are established and staffed. Although the Flatiron scientists’ work tends to be contained within the confines of 162 Fifth Avenue, the scientists have also, by design, been absorbed into foundation life across the street, attending lunch, lectures and staff meetings alongside their Simons Foundation colleagues.

Each Flatiron center supports various smaller research groups. The CCB tackles questions in biophysics, genomics, neuroscience, systems biology, signal processing and structural biology. Researchers from the CCB’s genomics group, for example, have used machine learning to understand and predict noncoding variants in DNA, and their biophysics group looks at modeling fluid-structure interactions at a sub-cellular level. Another avenue of research for the CCB is image and signal processing, especially in microscopy, where the raw data tends to be quite noisy and automatic analysis is difficult and error-prone. “Converting experimental data into the kind of information biologists can usefully interpret is a complex problem, and it drives a significant part of our algorithmic work,” says Greengard.

The CCA currently has two research groups. The statistical astronomy group, which Spergel co-leads with David Hogg of New York University, is developing new methods to extract information from large, complex and noisy datasets. The group applies these methods to astrophysical datasets ranging from exoplanets to stars in our own galaxy to cosmological measurements.

Composite image (of the full TNG100-1 box) which overlays a projection of the dark matter density with the output of our on-the-fly cosmological shock finder, here used to derive the average mach number of shocks along each line of sight. All the gravitationally collapsed structures (in orange/white) are surrounded by successive shock surfaces (blue) which encode their formation histories.

Composite image combining the predicted x-ray (free-free) emission from hot gas in halos (purple to orange) with the mach number of hydrodynamical shocks (from red to white, showing increasing strength). This view shows an extremely thin slice (only 100 pkpc thick) of the full TNG100-1 volume at z=0, centered on the second most massive cluster. Low mach number shocks at intergalactic filaments (red) converge into quasi-spherical, high mach number accretion shocks (white) which mark the boundaries between voids and gaseous halos.

TNG100-1: Fullbox composite which combines gas temperature (as the color) and shock mach number (as the brightness). Red indicates 10 million Kelvin gas at the centers of massive galaxy clusters, while bright structures show diffuse gas from the intergalactic medium shock heating at the boundary between cosmic voids and filaments.

Gas column density across the full large-volume TNG300-1 simulation at redshift zero. We center on the most massive cluster in the box, which emerges as a 10 Mpc scale gas overdensity in the center of this image. Otherwise, the homogeneity of large-scale structure begins to emerge across the extent of this volume. Here a thin slice of only 100 kpc in depth is shown.

The other group focuses on galaxy formation and is co-led by Rachel Somerville, who holds a joint position at Rutgers University and Flatiron, and Greg Bryan, who holds a joint position at Columbia University and Flatiron. This group’s focus is on developing sophisticated numerical simulations of the formation of galaxies and supermassive black holes. One of the challenges of modeling galaxy formation is that galaxy formation is the result of processes acting over a vast range of scales: from individual stars and black holes to cosmological scales of billions of light-years. The group plans to adopt a novel approach to this problem by carrying out numerical experiments to study the ‘small-scale’ processes (such as how dense gas is converted into stars within galaxies, or how energy emitted by accreting black holes affects their host galaxies) and implementing the insights they gain into larger-scale simulations. “This project is too big for any one researcher to be able to tackle. The breadth of expertise that we gather at the CCA enables us to realistically take on a project like this,” says Somerville.

The animation is a high-resolution hydrodynamical simulation of a star-forming dwarf galaxy in both face-on (left panel) and edge-on (right panel) views. The color scale indicates the gas density (red=denser, blue=more diffuse). At ~ 80 Myr it starts to form stars, which are shown as circles. The white circles are low-mass stars while the blue circles are massive stars. These massive stars will explode as “supernovae” at the end of their lifetime which inject enormous energy into the interstellar medium and create “bubbles” (the holes that come and go) and eventually drive galactic outflows, i.e. gas being pushed out of the galaxy as can be seen more clearly in the right panel.

The galaxy formation group will also serve the research community by creating a publicly available database of the simulation outputs and mock observations created from them. “There’s always more you can do with any simulation,” Somerville says. Making this database freely available will allow others to benefit from the work done at Flatiron.

The SCC, in addition to providing all-important computing capacity, helps researchers’ codes run faster and better by adapting them to make efficient use of the institute’s computational resources. Flatiron scientists typically prototype programs on their own desktop or laptop computers, but the techniques that work on small computers often need to be modified in order to run well on a high-performance computing cluster. When scientists are ready to move from their laptops to an SCC cluster, “that transition often requires some collaborative effort,” Carriero says. “We want to make sure the computing is not what’s limiting their progress.” The Flatiron Institute is forming close relationships with universities and other research institutes. “We want to make sure that the computational biology we do is tied to concrete problems,” Greengard says. As a result, Flatiron researchers typically have outside collaborators whose experiments influence the modeling and data analysis questions taken on in the CCB. “Our ongoing collaborations have generated a lot of enthusiasm on both sides.”

In the long run, researchers hope that the institute will support cross-disciplinary work as well. Although it is not always obvious, many of the same computational tasks come up in different scientific areas, from biology to astrophysics. “We do fluid dynamics, biologists do fluid dynamics,” Spergel says. “Their equations are applied on the scale of cells, ours are applied on the scale of galaxies. Still, we use the same fundamental equations.” By making it easy for researchers from the different centers to work together, Flatiron will make it possible for researchers in the different areas to learn from one another. “We are already becoming a place where people come to learn new algorithms and approaches. Because of the way we span fields, we have the potential to be a unique place in transferring information, approaches and techniques between areas,” Spergel says.

In the hallways of the institute, the scientists and other staff have all the spirit and momentum you’d expect at an ambitious, one-of-a-kind startup. “The mix of people here is different from the mix anywhere else I know,” Spergel says. “At most universities and industry labs, computational biologists don’t run into computational physicists at lunch. Here, we do.”

Simons Foundation

Annual Report