2596 Publications

The magnetic gradient scale length explains why certain plasmas require close external magnetic coils

John Kappel, Matt Landreman, D. Malhotra

The separation between the last closed flux surface of a plasma and the external coils that magnetically confine it is a limiting factor in the construction of fusion-capable plasma devices. This plasma-coil separation must be large enough so that components such as a breeding blanket and neutron shielding can fit between the plasma and the coils. Plasma-coil separation affects reactor size, engineering complexity, and particle loss due to field ripple. For some plasmas it can be difficult to produce the desired flux surface shaping with distant coils, and for other plasmas it is infeasible altogether. Here, we seek to understand the underlying physics that limits plasma-coil separation and explain why some configurations require close external coils. In this paper, we explore the hypothesis that the limiting plasma-coil separation is set by the shortest scale length of the magnetic field as expressed by the tensor. We tested this hypothesis on a database of 40 stellarator and tokamak configurations. Within this database, the coil-to-plasma distance compared to the minor radius varies by over an order of magnitude. The magnetic scale length is well correlated to the coil-to-plasma distance of actual coil designs generated using the REGCOIL method (Landreman 2017 Nucl. Fusion 57 046003). Additionally, this correlation reveals a general trend that larger plasma-coil separation is possible with a small number of field periods.

Show Abstract

Generalization in diffusion models arises from geometry-adaptive harmonic representations

Zahra Kadkhodaie, Florentin Guth, E. P. Simoncelli, S. Mallat

Deep neural networks (DNNs) trained for image denoising are able to generate high-quality samples with score-based reverse diffusion algorithms. These impressive capabilities seem to imply an escape from the curse of dimensionality, but recent reports of memorization of the training set raise the question of whether these networks are learning the "true" continuous density of the data. Here, we show that two DNNs trained on non-overlapping subsets of a dataset learn nearly the same score function, and thus the same density, when the number of training images is large enough. In this regime of strong generalization, diffusion-generated images are distinct from the training set, and are of high visual quality, suggesting that the inductive biases of the DNNs are well-aligned with the data density. We analyze the learned denoising functions and show that the inductive biases give rise to a shrinkage operation in a basis adapted to the underlying image. Examination of these bases reveals oscillating harmonic structures along contours and in homogeneous regions. We demonstrate that trained denoisers are inductively biased towards these geometry-adaptive harmonic bases since they arise not only when the network is trained on photographic images, but also when it is trained on image classes supported on low-dimensional manifolds for which the harmonic basis is suboptimal. Finally, we show that when trained on regular image classes for which the optimal basis is known to be geometry-adaptive and harmonic, the denoising performance of the networks is near-optimal.

Show Abstract

Scaling Laws for Associative Memories

Vivien Cabannes , Elvis Dohmatob, A. Bietti

Learning arguably involves the discovery and memorization of abstract rules. The aim of this paper is to study associative memory mechanisms. Our model is based on high-dimensional matrices consisting of outer products of embeddings, which relates to the inner layers of transformer language models. We derive precise scaling laws with respect to sample size and parameter size, and discuss the statistical efficiency of different estimators, including optimization-based algorithms. We provide extensive numerical experiments to validate and interpret theoretical results, including fine-grained visualizations of the stored memory associations.

Show Abstract

Generalization in diffusion models arises from geometry-adaptive harmonic representations

We explore the universality of neural encodings in convolutional neural networks trained on image classification tasks. We develop a procedure to directly compare the learned weights rather than their representations. It is based on a factorization of spatial and channel dimensions and measures the similarity of aligned weight covariances. We show that, for a range of layers of VGG-type networks, the learned eigenvectors appear to be universal across different natural image datasets. Our results suggest the existence of a universal neural encoding for natural images. They explain, at a more fundamental level, the success of transfer learning. Our work shows that, instead of aiming at maximizing the performance of neural networks, one can alternatively attempt to maximize the universality of the learned encoding, in order to build a principled foundation model.

Show Abstract

Complex priors and flexible inference in recurrent circuits with dendritic nonlinearities

Benjamin S. H. Lyo, Cristina Savin

Despite many successful examples in which probabilistic inference can account for perception, we have little understanding of how the brain represents and uses structured priors that capture the complexity of natural input statistics. Here we construct a recurrent circuit model that can implicitly represent priors over latent variables, and combine them with sensory and contextual sources of information to encode task-specific posteriors. Inspired by the recent success of diffusion models as means of learning and using priors over images, our model uses dendritic nonlinearities optimized for denoising, and stochastic somatic integration with the degree of noise modulated by an oscillating global signal. Combining these elements into a recurrent network yields a stochastic dynamical system that samples from the prior at a rate prescribed by the period of the global oscillator. Additional inputs reflecting sensory or top-down contextual information alter these dynamics to generate samples from the corresponding posterior, with different input gating patterns selecting different inference tasks. We demonstrate that this architecture can sample from low dimensional nonlinear manifolds and multimodal posteriors. Overall, the model provides a new framework for circuit-level representation of probabilistic information, in a format that facilitates flexible inference.

Show Abstract

Should Under-parameterized Student Networks Copy or Average Teacher Weights?

B. Şimşek, Amire Bendjeddou, Wulfram Gerstner, Johanni Brea

Any continuous function f∗ can be approximated arbitrarily well by a neural network with sufficiently many neurons k. We consider the case when f∗ itself is a neural network with one hidden layer and k neurons. Approximating f∗ with a neural network with n < k neurons can thus be seen as fitting an under-parameterized “student” network with n neurons to a “teacher” network with k neurons. As the student has fewer neurons than the teacher, it is unclear, whether each of the n student neurons should copy one of the teacher neurons or rather average a group of teacher neurons. For shallow neural networks with erf activation function and for the standard Gaussian input distribution, we prove that “copy-average” configurations are critical points if the teacher’s incoming vectors are orthonormal and its outgoing weights are unitary. Moreover, the optimum among such configurations is reached when n − 1 student neurons each copy one teacher neuron and the n-th student neuron averages the remaining k − n + 1 teacher neurons. For the student network with n = 1 neuron, we provide additionally a closed-form solution of the non-trivial critical point(s) for commonly used activation functions through solving an equivalent constrained optimization problem. Empirically, we find for the erf activation function that gradient flow converges either to the optimal copy-average critical point or to another point where each student neuron approximately copies a different teacher neuron. Finally, we find similar results for the ReLU activation function, suggesting that the optimal solution of underparameterized networks has a universal structure.

Show Abstract

Protein dynamics underlying allosteric regulation

Miro A. Astore, Akshada S. Pradhan, E. Thiede, S. Hanson

Allostery is the mechanism by which information and control are propagated in biomolecules. It regulates ligand binding, chemical reactions, and conformational changes. An increasing level of experimental resolution and control over allosteric mechanisms promises a deeper understanding of the molecular basis for life and powerful new therapeutics. In this review, we survey the literature for an up-to-date biological and theoretical understanding of protein allostery. By delineating five ways in which the energy landscape or the kinetics of a system may change to give rise to allostery, we aim to help the reader grasp its physical origins. To illustrate this framework, we examine three systems that display these forms of allostery: allosteric inhibitors of beta-lactamases, thermosensation of TRP channels, and the role of kinetic allostery in the function of kinases. Finally, we summarize the growing power of computational tools available to investigate the different forms of allostery presented in this review.

Show Abstract

The chromatin landscape of healthy and injured cell types in the human kidney

Debora L. Gisch, Michelle Brennan, W. Mao , et al.

There is a need to define regions of gene activation or repression that control human kidney cells in states of health, injury, and repair to understand the molecular pathogenesis of kidney disease and design therapeutic strategies. Comprehensive integration of gene expression with epigenetic features that define regulatory elements remains a significant challenge. We measure dual single nucleus RNA expression and chromatin accessibility, DNA methylation, and H3K27ac, H3K4me1, H3K4me3, and H3K27me3 histone modifications to decipher the chromatin landscape and gene regulation of the kidney in reference and adaptive injury states. We establish a spatially-anchored epigenomic atlas to define the kidney’s active, silent, and regulatory accessible chromatin regions across the genome. Using this atlas, we note distinct control of adaptive injury in different epithelial cell types. A proximal tubule cell transcription factor network of ELF3, KLF6, and KLF10 regulates the transition between health and injury, while in thick ascending limb cells this transition is regulated by NR2F1. Further, combined perturbation of ELF3, KLF6, and KLF10 distinguishes two adaptive proximal tubular cell subtypes, one of which manifested a repair trajectory after knockout. This atlas will serve as a foundation to facilitate targeted cell-specific therapeutics by reprogramming gene regulatory networks.

Show Abstract

Modulation of Aβ 16–22 aggregation by glucose

Meenal Jain , A. Sahoo, Silvina Matysiak

The self-assembly of amyloid-beta (Aβ) peptides into fibrillar structures in the brain is a signature of Alzheimer's disease. Recent studies have reported correlations between Alzheimer's disease and type-2 diabetes. Structurally, hyperglycemia induces covalent protein crosslinkings by advanced glycation end products (AGE), which can affect the stability of Aβ oligomers. In this work, we leverage physics-based coarse-grained molecular simulations to probe alternate thermodynamic pathways that affect peptide aggregation propensities at varying concentrations of glucose molecules. Similar to previous experimental reports, our simulations show a glucose concentration-dependent increase in Aβ aggregation rates, without changes in the overall secondary structure content. We discovered that glucose molecules prefer partitioning onto the aggregate–water interface at a specific orientation, resulting in a loss of molecular rotational entropy. This effectively hastens the aggregation rates, as peptide self-assembly can reduce the available surface area for peptide–glucose interactions. This work introduces a new thermodynamic-driven pathway, beyond chemical cross-linking, that can modulate Aβ aggregation.

Show Abstract
  • Previous Page
  • Viewing
  • Next Page
Advancing Research in Basic Science and MathematicsSubscribe to Flatiron Institute announcements and other foundation updates

privacy consent banner

Privacy preference

We use cookies to provide you with the best online experience. By clicking "Accept All," you help us understand how our site is used and enhance its performance. You can change your choice at any time here. To learn more, please visit our Privacy Policy.