- Organized by
Eric Vanden-Eijnden, Ph.D.New York University
Molecular dynamics (MD) simulations allow us to treat the motion and interactions of large biomolecules in full atomistic details at spatio-temporal resolutions that are typically beyond experimental reach. This capability is revolutionizing the way we understand biochemical processes by tackling critical puzzles in molecular biology, from the mechanism behind protein misfolding in diseases like Alzheimer’s, to channel conformations by which viruses like HIV and influenza infect eukaryotic cells, all the way to pathways that can overcome certain types of cancers.
More than half a century after the inception of MD, hardware developments, such as special-purpose high-performance computers (GPUs), or massively parallel simulations, together with methodology improvements have put us in the position to use MD simulations to tackle fundamental biological problems. While the steady growth in computing power facilitates the generation of longer trajectories in ever bigger systems, these trajectories have very little use without the appropriate statistical framework because biological systems are constantly fluctuating at molecular length scales. These fluctuations are random in nature, irreproducible in detail and only some of their statistical features are relevant to their biochemical function. It is therefore important to establish robust methods for the inspection of the MD simulation data.
This conference will review recent developments in statistical mechanics theories of biomolecular systems, identifying the key statistical descriptors this approach entails. We will discuss how to design adaptive algorithms to calculate these descriptors directly and efficiently using the MD engine as a building block. In particular, we will focus on methods capable of accelerating the exploration and sampling of the conformational and trajectory spaces of complex molecular systems and analyzing the output of these simulations using systematic tools that bypass the standard “look-and-see” techniques.
The conference discussed in depth the themes listed in the synopsis and explored new trends in the field of molecular dynamics (MD) simulations of biomolecular systems. Perhaps the main take-home message was to confirm that brute-force MD simulations alone will not be sufficient to understand the inner workings of biomolecular systems. In order to meet this goal, we will need to use MD simulations in concert with cutting-edge tools from nonequilibrium statistical mechanics, probability theory, scientific computing and machine learning. A few highlights:
On the theoretical and computational sides, several talks discussed systematic strategies for free energy calculations, which are essential in the context of ion permeation, macromolecular conformational changes, ligand binding specificity and protein-protein association. The free energy is the thermodynamic driving force behind these processes, and it can, in principle, be estimated by brute-force MD simulations, but this procedure is far too inefficient in practice. As a result, many enhanced sampling strategies have been proposed over the years. There is a great need for systematizing these strategies by putting them on a firm mathematical basis, e.g., using tools from large deviation theory or eigenvalue analysis to estimate their efficiency and optimizing their parameters. Several talks explored these ideas, e.g., those by Abrams, Chodera and Tuckerman.
Another important theme was understanding the mechanism, pathways and rate of activated processes, which are key in many important biochemical processes, biochemical reactions, and conformation changes of macromolecules or proteinligand binding. In particular, Vanden-Eijnden discussed how to bypass the brute-force calculation of reactive trajectories by which activated processes occur, and Bolhuis, Elber, Lelievre, Roux, and Weare discussed how to accelerate their sampling using nonequilibrium umbrella sampling methods (NEUS, aka as stratification methods in trajectory space).
There were also several talks that addressed the issue of ‘spatial coarse graining’ beside that of temporal coarse graining, which is key in processes such as protein-mediated membrane fusion, transduction cascade or those involving the cytoskeleton — all these systems that are too big to be tackled by stand-alone MD simulations. In this context, the MD engine must be incorporated within multiscale computational schemes, such as the ultra-coarse-graining method presented by Voth, ReaDDy presented by Noé, or the Green’s function reaction dynamics presented by Bolhuis. These techniques fit the framework of the heterogeneous multiscale method, which is general framework discussed by Vanden-Eijnden by which simulations involving different physical processes at different spatio-temporal scales can be seamlessly meshed together.
On the data-processing side, Markov state models (MSM) were discussed at length: The basic idea behind MSMs is to map the original dynamics unto a Markov jump process (MJP) and to process the MD time-series data via Bayesian inference techniques such as maximum likelihood estimation to calculate the rate matrix of this MJP. This, in turn, permits the characterization of the system’s kinetics on time scales larger than those reached in the MD simulations; indeed, MSMs have the advantage that they also permit to recombine short simulations run in parallel to extract long time information. Specific issues that were discussed in the talks by Clementi, Elber, Noé, Vanden-Eijnden and Weare are how to optimally close the state in MSMs and how to interface better MSMs and the NEUS techniques.
Another interesting topic discussed in several talks was how to use available data from experiments or ab initio calculations, to accelerate the MD simulations, validate the force fields or build better coarse-grained models of biomolecular systems. For example, Voth discussed how to incorporate experimental data in the design of coarse-grained models; Chodera showed how to use training data from experimental measurements and high-level ab initio calculations to optimize a classical molecular model of water via some efficient least square fitting; Clementi used diffusion maps to learn on the fly from the output of MD simulation and accelerate them; and Dill discussed a method termed ‘modeling employing limited data’ (MELD) that incorporates structural or heuristic information into MD simulations for improved protein structure determination.
Concerning the biological aspects, a key theme was protein folding, which remains one of the most fascinating puzzles of biochemistry. As noted by Dill, the problem involves three broad questions: (i) What is the physical code by which an amino acid sequence dictates a protein’s native structure? (ii) How can proteins fold so fast? (iii) Can we devise a computer algorithm to predict protein structures from their sequences? Dill discussed accelerated MD simulations using MELD that show performance at predicting native structures that are comparable to or better than those of bioinformatics tools such as Xplor-NIH or ROSETTA-EPR in the context of CASP (Critical Assessment of Structure Prediction), a blind assessment competition involving close to 200 research groups.
Another key question in biochemistry is that of protein-ligand binding, which is of crucial importance in the contexts of drug design and discovery. This problem is typically tackled via algorithms such as DOCK, which are fast and often able to find correct binding sites and ligand poses but rarely give accurate binding affinities. Several talks, in particular those of Chodera, Elber and Roux, showed that accelerated MD simulations could have a transformative effect in this context, because they use better potentials and perform more complete conformational sampling and solvation. In particular, they discussed how the residence time that the drug spends bound to its target — rather than the thermodynamic binding free energy — is often the key factor in determining drug efficacy.
The talk of Chodera also discussed how MD simulations could be used to help elucidate the mechanism of kinase signal transduction cascades, which are ubiquitously found in biology as a means of integrating and amplifying weak extracellular signals to control programmed cellular responses. For example, the RAS-ERK pathway is an evolutionarily-conserved signaling pathway that makes use of such a cascade to control cell survival and proliferation; this specific pathway has a high frequency of mutation in cancer, with 30 percent of all cancers harboring mutations in RAS, which also leads to poor prognostic for survival.
Another important biological problem that was addressed is that of protein-mediated membrane fusion, which is the biological response to the need to transfer contents of one encapsulated cellular subcompartments in eukaryotic cells to another. The subcompartments, like the endoplasmic reticula (ER), Golgi bodies, vacuoles, endosomes, mitochondria, etc., are defined by boundaries made of lipid bilayers, like the outer membrane of the cell itself. Viruses, such as HIV and influenza, that infect eukaryotic cells are also encapsulated in lipid bilayers stolen from the infected cells that produce them. Abrams, Elber and Roux showed how to use accelerated MD simulations in the context.
Going larger in scales, there were also several talks discussing biomolecular aspects that evolve the cell itself. In particular, Noé and Voth discuss how MD simulation combine with coarse-graining tools can be used to study how cells spatially and temporally regulate the formation and maintenance of multiple cytoskeleton networks within the same crowded cytoplasm.
MONDAY, NOVEMBER 13
9:00 AM Eric Vanden-Eijnden | Introductory Remarks 9:30 AM Ken Dill | Towards Better Scaling in Physics-based Modeling of Biomolecules 10:20 AM John Chodera | Challenges for Bridging Scales in Cancer 11:30 AM Mark Tuckerman |Accelerated Exploration and Learning of Conformational Free Energy Landscapes 12:20 PM Jonathan Weare | Stratification for Rare Event Problems 2:30 PM Benoit Roux | Computational Studies of Bimolecular Systems - Challenges and Solutions 3:20 PM Cameron Abrams | On-the-Fly Free-Energy Parameterization: Better Statistics in Biomolecular Simulations from Enhanced Sampling 4:30 PM Eric Vanden-Eijnden | Modeling Reactive Events in Molecular Systems 5:20 PM Day One Summary Discussion
TUESDAY, NOVEMBER 14
9:00 AM Gregory A. Voth | Systematic Coarse-graining from the Bottom Up 9:50 AM Frank Noe | From All-atom Molecular Kinetics Beyond the Seconds Timescale to Cellular Signal Transduction 11:00 AM Ron Elber | Atomically Detailed Simulations in Molecular Biophysics at Long Time Scales 11:50 AM Cecilia Clementi | Incorporating Experimental Data in Long Timescales Macromolecular Simulations 2:00 PM Jianfeng Lu | Accelerated Equilibrium and Non-equilibrium Sampling: From Mathematics to Numerical Methods 2:50 PM Tony Lelievre | Sampling in Molecular Dynamics: Numerical and Mathematical Challenges 4:00 PM Peter Bolhuis | Bridging Length and Time Scales with Transition Path Sampling and Green’s Function Reaction Dynamics 4:50 PM Talk 8 5:40 PM Day Two Summary Discussion