2789 Publications

Understanding Input Selectivity in Mamba: Impact on Approximation Power, Memorization, and Associative Recall Capacity

T. Huang, Miguel Sarabia, Et al.

State-Space Models (SSMs), and particularly Mamba, have recently emerged as a promising alternative to Transformers. Mamba introduces input selectivity to its SSM layer (S6) and incorporates convolution and gating into its block definition. While these modifications do improve Mamba's performance over its SSM predecessors, it remains largely unclear how Mamba leverages the additional functionalities provided by input selectivity, and how these interact with the other operations in the Mamba architecture. In this work, we demystify the role of input selectivity in Mamba, investigating its impact on function approximation power, long-term memorization, and associative recall capabilities. In particular: (i) we prove that the S6 layer of Mamba can represent projections onto Haar wavelets, providing an edge over its Diagonal SSM (S4D) predecessor in approximating discontinuous functions commonly arising in practice; (ii) we show how the S6 layer can dynamically counteract memory decay; (iii) we provide analytical solutions to the MQAR associative recall task using the Mamba architecture with different mixers --- Mamba, Mamba-2, and S4D. We demonstrate the tightness of our theoretical constructions with empirical results on concrete tasks. Our findings offer a mechanistic understanding of Mamba and reveal opportunities for improvement.

Show Abstract

The first complete 3D reconstruction and morphofunctional mapping of an insect eye

Anastasia A Makarova, N. Chua, Anna V Diakova, Inna A Desyatirkina, P. Gunn, Song Pang, C Shan Xu, Herald F Hess, D. Chklovskii, Alexey A Polilov

The structure of compound eyes in arthropods has been the subject of many studies, revealing important biological principles. Until recently, these studies were constrained by the two-dimensional nature of available ultrastructural data. By taking advantage of the novel three-dimensional ultrastructural dataset obtained using volume electron microscopy, we present the first cellular-level reconstruction of the whole compound eye of an insect, the miniaturized parasitoid wasp Megaphragma viggianii. The compound eye of the female M. viggianii consists of 29 ommatidia and contains 478 cells. Despite the almost anucleate brain, all cells of the compound eye contain nuclei. As in larger insects, the dorsal rim area of the eye in M. viggianii contains ommatidia that are believed to be specialized in polarized light detection as reflected in their corneal and retinal morphology. We report the presence of three ‘ectopic’ photoreceptors. Our results offer new insights into the miniaturization of compound eyes and scaling of sensory organs in general.

Show Abstract

Heuristic energy-based cyclic peptide design

Q. Zhu, V. Mulligan, Dennis Shasha

Rational computational design is crucial to the pursuit of novel drugs and therapeutic agents. Meso-scale cyclic peptides, which consist of 7-40 amino acid residues, are of particular interest due to their conformational rigidity, binding specificity, degradation resistance, and potential cell permeability. Because there are few natural cyclic peptides, de novo design involving non-canonical amino acids is a potentially useful goal. Here, we develop an efficient pipeline (CyclicChamp) for cyclic peptide design. After converting the cyclic constraint into an error function, we employ a variant of simulated annealing to search for low-energy peptide backbones while maintaining peptide closure. Compared to the previous random sampling approach, which was capable of sampling conformations of cyclic peptides of up to 14 residues, our method both greatly accelerates the computation speed for sampling conformations of small macrocycles (ca. 7 residues), and addresses the high-dimensionality challenge that large macrocycle designs often encounter. As a result, CyclicChamp makes conformational sampling tractable for 15- to 24-residue cyclic peptides, thus permitting the design of macrocycles in this size range. Microsecond-length molecular dynamics simulations on the resulting 15, 20, and 24 amino acid cyclic designs identify designs with kinetic stability. To test their thermodynamic stability, we perform additional replica exchange molecular dynamics simulations and generate free energy surfaces. Three 15-residue designs, one 20-residue and one 24-residue design emerge as promising candidates.

Show Abstract

The Geometry of Prompting: Unveiling Distinct Mechanisms of Task Adaptation in Language Models

A. Kirsanov, C. Chou , Kyunghyun Cho, S. Chung

Decoder-only language models have the ability to dynamically switch between various computational tasks based on input prompts. Despite many successful applications of prompting, there is very limited understanding of the internal mechanism behind such flexibility. In this work, we investigate how different prompting methods affect the geometry of representations in these models. Employing a framework grounded in statistical physics, we reveal that various prompting techniques, while achieving similar performance, operate through distinct representational mechanisms for task adaptation. Our analysis highlights the critical role of input distribution samples and label semantics in few-shot in-context learning. We also demonstrate evidence of synergistic and interfering interactions between different tasks on the representational level. Our work contributes to the theoretical understanding of large language models and lays the groundwork for developing more effective, representation-aware prompting strategies.

Show Abstract

Distributional Associations vs In-Context Reasoning: A Study of Feed-forward and Attention Layers

Lei Chen, J. Bruna, A. Bietti

Large language models have been successful at tasks involving basic forms of in-context reasoning, such as generating coherent language, as well as storing vast amounts of knowledge. At the core of the Transformer architecture behind such models are feed-forward and attention layers, which are often associated to knowledge and reasoning, respectively. In this paper, we study this distinction empirically and theoretically in a controlled synthetic setting where certain next-token predictions involve both distributional and in-context information. We find that feed-forward layers tend to learn simple distributional associations such as bigrams, while attention layers focus on in-context reasoning. Our theoretical analysis identifies the noise in the gradients as a key factor behind this discrepancy. Finally, we illustrate how similar disparities emerge in pre-trained models through ablations on the Pythia model family on simple reasoning tasks.

Show Abstract

Superfast Direct Inversion of the Nonuniform Discrete Fourier Transform via Hierarchically Semiseparable Least Squares

Heather Wilber, Ethan N. Epperly, A. Barnett

A direct solver is introduced for solving overdetermined linear systems involving nonuniform discrete Fourier transform matrices. Such matrices can be transformed into a Cauchy-like form that has hierarchical low rank structure. The rank structure of this matrix is explained, and it is shown that the ranks of the relevant submatrices grow only logarithmically with the number of columns of the matrix. A fast rank-structured hierarchical approximation method based on this analysis is developed, along with a hierarchical least-squares solver for these and related systems. This result is a direct method for inverting nonuniform discrete transforms with a complexity that is usually nearly linear with respect to the degrees of freedom in the problem. This solver is benchmarked against various iterative and direct solvers in the setting of inverting the one-dimensional type-II (or forward) transform, for a range of condition numbers and problem sizes (up to (4 10

Show Abstract

Perceptual learning improves discrimination but does not reduce distortions in appearance

Sarit F.A. Szpiro, Charlie S. Burlingham, E. P. Simoncelli, Marisa Carrasco

Human perceptual sensitivity often improves with training, a phenomenon known as “perceptual learning.” Another important perceptual dimension is appearance, the subjective sense of stimulus magnitude. Are training-induced improvements in sensitivity accompanied by more accurate appearance? Here, we examined this question by measuring both discrimination (sensitivity) and estimation (appearance) responses to near-horizontal motion directions, which are known to be repulsed away from horizontal. Participants performed discrimination and estimation tasks before and after training in either the discrimination or the estimation task or none (control group). Human observers who trained in either discrimination or estimation exhibited improvements in discrimination accuracy, but estimation repulsion did not decrease; instead, it either persisted or increased. Hence, distortions in perception can be exacerbated after perceptual learning. We developed a computational observer model in which perceptual learning arises from increases in the precision of underlying neural representations, which explains this counterintuitive finding. For each observer, the fitted model accounted for discrimination performance, the distribution of estimates, and their changes with training. Our empirical findings and modeling suggest that learning enhances distinctions between categories, a potentially important aspect of real-world perception and perceptual learning.

Show Abstract

Charge distribution and helicity tune the binding of septin’s amphipathic helix domain to membranes

C. Edelmaier, Stephen J. Klawa, M. Mofidi, S. Hanson, et al.

Amphipathic helices (AHs) are secondary structures that can facilitate binding of proteins to the membrane by folding into a helix with hydrophobic and hydrophilic faces that interact with the same surfaces in the lipid membrane. Septins are cytoskeletal proteins that preferentially bind to domains of micron-scale curvature on the cell membrane. Studies have shown that AH domains in septin are essential for curvature sensing. We present the first computational study of septin AH interactions with lipid bilayers. Using all-atom simulations and metadynamics-enhanced sampling, we study the effect of charge distribution at the flanking ends of septin AH on the energy for helical folding and its consequences on the binding configuration and affinity to the membrane. This is relevant to septins, since the net positive charge on the flanking C-terminal amino acids is a conserved property across several organisms. Simulations revealed that the energy barrier for folding in the neutral-capped AH is much larger than the charge-capped AH, leading to a small fraction of AH folding and integration to the membrane compared to a significantly folded configuration in the bound charge-capped AH. These observations are consistent with the binding measurements of synthetic AH constructs with variable helicity to lipid vesicles. Additionally, we examined an extended AH sequence including eight amino acids upstream and downstream of the AH to mimic the native protein. Again, simulations and experiments show that the extended peptide, with a net positive charge at C-terminus, adopts a strong helical configuration in solution, giving rise to a higher membrane affinity. Altogether, these results identify the energy cost for folding of AHs as a regulator of AH binding configuration and affinity and provide a basic template for parameterizing AH-membrane interactions as a starting point for the future multiscale simulations for septin-membrane interactions.

Show Abstract

Charge distribution and helicity tune the binding of septin’s amphipathic helix domain to membranes

C. Edelmaier, Stephen J. Klawa, M. Mofidi, S. Hanson, et al.

Amphipathic helices (AHs) are secondary structures that can facilitate binding of proteins to the membrane by folding into a helix with hydrophobic and hydrophilic faces that interact with the same surfaces in the lipid membrane. Septins are cytoskeletal proteins that preferentially bind to domains of micron-scale curvature on the cell membrane. Studies have shown that AH domains in septin are essential for curvature sensing. We present the first computational study of septin AH interactions with lipid bilayers. Using all-atom simulations and metadynamics-enhanced sampling, we study the effect of charge distribution at the flanking ends of septin AH on the energy for helical folding and its consequences on the binding configuration and affinity to the membrane. This is relevant to septins, since the net positive charge on the flanking C-terminal amino acids is a conserved property across several organisms. Simulations revealed that the energy barrier for folding in the neutral-capped AH is much larger than the charge-capped AH, leading to a small fraction of AH folding and integration to the membrane compared to a significantly folded configuration in the bound charge-capped AH. These observations are consistent with the binding measurements of synthetic AH constructs with variable helicity to lipid vesicles. Additionally, we examined an extended AH sequence including eight amino acids upstream and downstream of the AH to mimic the native protein. Again, simulations and experiments show that the extended peptide, with a net positive charge at C-terminus, adopts a strong helical configuration in solution, giving rise to a higher membrane affinity. Altogether, these results identify the energy cost for folding of AHs as a regulator of AH binding configuration and affinity and provide a basic template for parameterizing AH-membrane interactions as a starting point for the future multiscale simulations for septin-membrane interactions.

Show Abstract

Formation of Drosophila germ cells requires spatial patterning of phospholipids

Marcus Kilwein, P. Miller, S. Shvartsman, et al.

Germline-soma segregation is crucial for fertility. Primordial germ cells (PGCs) arise early in development and are the very first cells to form in the Drosophila embryo. At the time of PGC formation, the embryo is a syncytium where nuclei divide within a common cytoplasm. Whereas invaginating plasma membrane furrows enclose nuclei to form somatic lineages during the 14th nuclear division cycle, PGCs emerge from the syncytium during the 9th division cycle in a mechanistically distinct process. PGC formation depends on maternally deposited germ granules localized at the embryo’s posterior pole. Germ granules trigger protrusion of membrane buds that enlarge to surround several nuclei that reach the posterior pole. Buds are remodeled to cells through mitotic division and constriction of the bud neck. Previous studies implicated F-actin,1 actin regulators,2,3 and contractile ring components4 in mitotic furrow formation, but what drives bud emergence and how germ granules provoke reshaping of the plasma membrane remain unknown. Here, we investigate the mechanism of germ-granule-induced bud formation. Treating the embryo as a pressurized elastic shell, we used mathematical modeling to examine possible mechanical mechanisms for local membrane protrusion. One mechanism, outward buckling produced by polymerization of a branched F-actin network, is supported by experimental data. Further, we show that germ granules modify membrane lipid composition, promoting local branched F-actin polymerization that initiates PGC formation. We propose that a mechanism for membrane lipid regulation of F-actin dynamics in migrating cells has been adapted for PGC formation in response to spatial cues provided by germ granules.

Show Abstract
  • Previous Page
  • Viewing
  • Next Page
Advancing Research in Basic Science and MathematicsSubscribe to Flatiron Institute announcements and other foundation updates