2697 Publications

Neocortex: a lean mean memory storage machine

D. Chklovskii, B. Mizusaki, A. Stepanyants, P. Sjöström

Connectivity patterns of neocortex exhibit several odd properties: for example, most neighboring excitatory neurons do not connect, which seems curiously wasteful. Brunel’s elegant theoretical treatment reveals how optimal information storage can naturally impose these peculiar properties.

Show Abstract

Derivation of Neural Circuits from the Similarity Matching Principle

Our brains analyze high-dimensional datasets streamed by our sensory organs in multiple stages. Sensory cortices, for example, perform tasks like dimensionality reduction, sparse feature discovery and clustering. To model these tasks we pursue an approach analogous to use of action principles in physics and propose a new family of objective functions based on the principle of similarity matching. From these objective functions we derive online distributed algorithms that can be implemented by biological neural networks resembling cortical circuits. Our networks can adapt to changes in the number of latent dimensions or the number of clusters in the input dataset. Furthermore, we formulate minimax optimization problems from which we derive online algorithms with two classes of neurons identified with principal neurons and interneurons in biological circuits. In addition to bearing resemblance to biological circuits, our algorithms are competitive for Big Data applications.

Show Abstract

Big Data, Social Media, and Protest

J Tucker, J Nagler, M MacDuffee, P Metzger, D Penfold-Brown, R. Bonneau

The past decade has witnessed a rapid rise in the use of social media around the globe. 1
For political scientists, this is a phenomenon begging to be understood. It has been claimed
repeatedly–usually in the absence of solid data–that these social media resources are
profoundly shaping participation in social movements, including protest movements (see
Bond, Fariss, Jones, Kramer, Marlow, Settle, & Fowler 2012; Cha et al. 2010; Jungherr,
Jurgens, & Schoen 2012; Lynch 2011; Shirky 2011). Social media are often assumed to affect an extremely wide range of individual-level behaviors, including communicating about politics to friends and family members, donating or soliciting money for political campaigns and causes, voting, and engaging in collective forms of protest. In truth, however, the research community knows remarkably little about whether (and especially how) the use of social media systematically affects political participation.

Show Abstract

Mediator facilitates transcriptional activation and dynamic long-range contacts at the IgH locus during class switch recombination

A Thomas-Claudpierre, I Robert, P Rocha, R Raviram, E Schiavo, V Heyer, R. Bonneau, V Luo, J Reddy, T Borggrefe, J Skok, B Reina-San-Martin

Immunoglobulin (Ig) class switch recombination (CSR) is initiated by the transcription-coupled recruitment of activation-induced cytidine deaminase (AID) to Ig switch regions (S regions). During CSR, the IgH locus undergoes dynamic three-dimensional structural changes in which promoters, enhancers, and S regions are brought to close proximity. Nevertheless, little is known about the underlying mechanisms. In this study, we show that Med1 and Med12, two subunits of the mediator complex implicated in transcription initiation and long-range enhancer/promoter loop formation, are dynamically recruited to the IgH locus enhancers and the acceptor regions during CSR and that their knockdown in CH12 cells results in impaired CSR. Furthermore, we show that conditional inactivation of Med1 in B cells results in defective CSR and reduced acceptor S region transcription. Finally, we show that in B cells undergoing CSR, the dynamic long-range contacts between the IgH enhancers and the acceptor regions correlate with Med1 and Med12 binding and that they happen at a reduced frequency in Med1-deficient B cells. Our results implicate the mediator complex in the mechanism of CSR and are consistent with a model in which mediator facilitates the long-range contacts between S regions and the IgH locus enhancers during CSR and their transcriptional activation.

Ig class switch recombination (CSR) is a long-range DNA recombination reaction that occurs between Ig switch regions (S regions) and that replaces the isotype expressed (from IgM to IgG, IgE, or IgA), providing novel effector functions for efficient antigen clearance (Chaudhuri et al., 2007). CSR is initiated by the transcription-coupled recruitment of activation-induced cytidine deaminase (AID; Basu et al., 2011; Pavri and Nussenzweig, 2011), an enzyme that deaminates cytosines into uracils in the single-strand DNA exposed by transcription (Petersen-Mahrt et al., 2002). During CSR, the choice of recombination to a particular isotype is determined by the activation of specific S region promoters (Basu et al., 2011; Pavri and Nussenzweig, 2011), triggering the generation of noncoding germline transcripts (Chaudhuri et al., 2007). Germline transcription precedes recombination, is induced at both the donor and acceptor S regions, and is required for recombination (Chaudhuri et al., 2007). Transcriptional activation of the IgH locus during CSR is controlled by the Eμ enhancer located upstream of the donor (Sμ) S region and by a major regulatory region (RR) located at the 3′ end of the locus (3′ RR). Both of these enhancer elements are required for transcription and for CSR (Chaudhuri et al., 2007; Pavri and Nussenzweig, 2011). The current model is that during CSR, recombination between S regions proceeds by the inducible formation of long-range DNA loops involving the S region promoters and the Eμ and 3′ RR enhancers (Wuerffel et al., 2007; Kenter et al., 2012), possibly through transcription factors (Feldman et al., 2015). Nevertheless, the molecular mechanisms controlling these conformational changes remain to be elucidated.

Mediator is an evolutionarily conserved multiprotein complex composed of 31 subunits organized in four modules that is required for gene transcription by RNA polymerase II (Pol II; Malik and Roeder, 2010; Conaway and Conaway, 2011). The head, middle, and tail modules form a stable core complex that associates reversibly with the CDK8 module (consisting of cyclin-dependent kinase 8, cyclin C, Med12, and Med13) to control interactions of mediator with the Pol II machinery (Malik and Roeder, 2010; Conaway and Conaway, 2011). Mediator behaves as an interface between Pol II and transcription factors and is capable of promoting Pol II preinitiation complex assembly, transcription initiation by Pol II, regulation of Pol II pausing and elongation, recruitment of transcription elongation factors, and control of the phosphorylation state of the C-terminal domain of Pol II (Malik and Roeder, 2010; Conaway and Conaway, 2011; Allen and Taatjes, 2015). The Med1 subunit of mediator, part of the middle module, interacts with distinct transcriptional activators (Borggrefe and Yue, 2011) and has been shown to play a key role in embryonic development (Ito et al., 2000; Zhu et al., 2000), erythropoiesis (Stumpf et al., 2010), and iNKT cell development (Yue et al., 2011). In addition, Med1 recruitment to chromatin is one of the features that characterizes super enhancers (Whyte et al., 2013). Interestingly, mediator has also been implicated, together with cohesin, in the formation of long-range DNA loops (Malik and Roeder, 2010; Conaway and Conaway, 2011; Allen and Taatjes, 2015), and chromatin immunoprecipitation sequencing (ChIP-Seq) analysis for Smc1, Smc3, Med1, and Med12 revealed that cohesin–mediator binding predicts genomic sites of long-range promoter–enhancer interactions (Kagey et al., 2010; Phillips-Cremins et al., 2013). As we have recently implicated the cohesin complex in the mechanism of CSR (Thomas-Claudepierre et al., 2013), we have examined the role of mediator in CSR by performing shRNA-mediated knockdowns of the Med1 and Med12 subunits of mediator (belonging to different modules) in CH12 cells and by conditionally inactivating the Med1 subunit in developing B cells.

Show Abstract

Self-calibrating neural networks for dimensionality reduction

Recently, a novel family of biologically plausible online algorithms for reducing the dimensionality of streaming data has been derived from the similarity matching principle. In these algorithms, the number of output dimensions can be determined adaptively by thresholding the singular values of the input data matrix. However, setting such threshold requires knowing the magnitude of the desired singular values in advance. Here we propose online algorithms where the threshold is self-calibrating based on the singular values computed from the existing observations. To derive these algorithms from the similarity matching cost function we propose novel regularizers. As before, these online algorithms can be implemented by Hebbian/anti-Hebbian neural networks in which the learning rule depends on the chosen regularizer. We demonstrate both mathematically and via simulation the effectiveness of these online algorithms in various settings.

Show Abstract

Do retinal ganglion cells project natural scenes to their principal subspace and whiten them?

R. Abbasi-Asl, C. Pehlevan, B. Yu, D. Chklovskii

Several theories of early sensory processing suggest that it whitens sensory stimuli. Here, we test three key predictions of the whitening theory using recordings from 152 ganglion cells in salamander retina responding to natural movies. We confirm the previous finding that firing rates of ganglion cells are less correlated compared to natural scenes, although significant correlations remain. We show that while the power spectrum of ganglion cells decays less steeply than that of natural scenes, it is not completely flattened. Finally, we find evidence that only the top principal components of the visual stimulus are transmitted.

Show Abstract

4C-ker: a method to reproducibly identify genome-wide interactions captured by 4C-Seq experiments

R Raviram, P Rocha, C. Müller, E. Miraldi, S Badri, Y Fu, E Swanzey, C Proudhon, V Snetkova, R. Bonneau, J Skok

4C-Seq has proven to be a powerful technique to identify genome-wide interactions with a single locus of interest (or “bait”) that can be important for gene regulation. However, analysis of 4C-Seq data is complicated by the many biases inherent to the technique. An important consideration when dealing with 4C-Seq data is the differences in resolution of signal across the genome that result from differences in 3D distance separation from the bait. This leads to the highest signal in the region immediately surrounding the bait and increasingly lower signals in far-cis and trans. Another important aspect of 4C-Seq experiments is the resolution, which is greatly influenced by the choice of restriction enzyme and the frequency at which it can cut the genome. Thus, it is important that a 4C-Seq analysis method is flexible enough to analyze data generated using different enzymes and to identify interactions across the entire genome. Current methods for 4C-Seq analysis only identify interactions in regions near the bait or in regions located in far-cis and trans, but no method comprehensively analyzes 4C signals of different length scales. In addition, some methods also fail in experiments where chromatin fragments are generated using frequent cutter restriction enzymes. Here, we describe 4C-ker, a Hidden-Markov Model based pipeline that identifies regions throughout the genome that interact with the 4C bait locus. In addition, we incorporate methods for the identification of differential interactions in multiple 4C-seq datasets collected from different genotypes or experimental conditions. Adaptive window sizes are used to correct for differences in signal coverage in near-bait regions, far-cis and trans chromosomes. Using several datasets, we demonstrate that 4C-ker outperforms all existing 4C-Seq pipelines in its ability to reproducibly identify interaction domains at all genomic ranges with different resolution enzymes.

Show Abstract

Environmental gene regulatory influence networks in rice (Oryza sativa): response to water deficit, high temperature and agricultural environments

O. Wilkins, C. Hafemiester, A. Plessis, M.-M. Holloway-Phillips, G. Pham, A.B. Nicotra, G.B. Gregorio, S.V.K. Jagadish, E.M. Septiningsih, R. Bonneau, M. Purugganan

We inferred an environmental gene regulatory influence network (EGRIN) of the response of tropical Asian rice (Oryza sativa) to high temperatures, water deficit and agricultural environments. This network integrates transcriptome data (RNA-seq) and chromatin accessibility measurements (ATAC-seq) from five rice cultivars that were grown in controlled experiments and in agricultural fields. We identified open chromatin regions covering ~2% of the genome. These regions were highly overrepresented proximal to the transcriptional start sites of genes and were used to define the promoters for all genes. We used the occurrences of known cis-regulatory motifs in the promoters to generate a network prior comprising 77,071 interactions. We then estimated the regulatory activity of each TF (TFA;143 TFs) based on the expression of its target genes in the network prior across 360 experimental conditions. We inferred an EGRIN using the estimated TFA, rather than the TF expression, as the regulator. The EGRIN identified hypotheses for 4,052 genes regulated by 113 TFs; of these, 18% were in the network prior. We resolved distinct regulatory roles for members of a large TF family, including a putative regulatory connection between abiotic stress and the circadian clock, as well as specific regulatory functions for TFs in the drought response. We find that TFA estimation is an effective way of incorporating multiple genome-scale measurements into network inference and that supplementing data from controlled experimental conditions with data from outdoor field conditions increases the resolution of EGRIN inference.

Show Abstract
March 3, 2016

Tweeting identity? Ukrainian, Russian, and# Euromaidan

M MacDuffee Metzger, R. Bonneau, J Nagler, J Tucker

Why and when do group identities become salient? Existing scholarship has suggested that insecurity and competition over political and economic resources as well as increased perceptions of threat from the out-group tend to increase the salience of ethnic identities. Most of the work on ethnicity, however, is either experimental and deals with how people respond once identity has already been primed, is based on self-reported measures of identity, or driven by election results. In contrast, here we examine events in Ukraine from late 2013 (the beginning of the Euromaidan protests) through the end of 2014 to see if particular moments of heightened political tension led to increased identification as either “Russian” or “Ukrainian” among Ukrainian citizens. In tackling this question, we use a novel methodological approach by testing the hypothesis that those who prefer to use Ukrainian to communicate on Twitter will use Ukrainian (at the expense of Russian) following moments of heightened political awareness and those who prefer to use Russian will do the opposite. Interestingly, our primary finding in is a negative result: we do not find evidence that key political events in the Ukrainian crisis led to a reversion to the language of choice at the aggregate level, which is interesting given how much ink has been spilt on the question of the extent to which Euromaidan reflected an underlying Ukrainian vs. Russian conflict. However, we unexpectedly find that both those who prefer Russian and those who prefer Ukrainian begin using Russian with a greater frequency following the annexation of Crimea, thus contributing a whole new set of puzzles – and a method for exploring these puzzles – that can serve as a basis for future research.

Show Abstract

Robust classification of protein variation using structural modelling and large-scale data integration

E Baugh, R Simmons-Edler, C. Müller, R Alford, N. Volfovsky, R. Bonneau

Existing methods for interpreting protein variation focus on annotating mutation pathogenicity rather than detailed interpretation of variant deleteriousness and frequently use only sequence-based or structure-based information. We present VIPUR, a computational framework that seamlessly integrates sequence analysis and structural modelling (using the Rosetta protein modelling suite) to identify and interpret deleterious protein variants. To train VIPUR, we collected 9477 protein variants with known effects on protein function from multiple organisms and curated structural models for each variant from crystal structures and homology models. VIPUR can be applied to mutations in any organism's proteome with improved generalized accuracy (AUROC .83) and interpretability (AUPR .87) compared to other methods. We demonstrate that VIPUR's predictions of deleteriousness match the biological phenotypes in ClinVar and provide a clear ranking of prediction confidence. We use VIPUR to interpret known mutations associated with inflammation and diabetes, demonstrating the structural diversity of disrupted functional sites and improved interpretation of mutations associated with human diseases. Lastly, we demonstrate VIPUR's ability to highlight candidate variants associated with human diseases by applying VIPUR to de novo variants associated with autism spectrum disorders.

Show Abstract
  • Previous Page
  • Viewing
  • Next Page
Advancing Research in Basic Science and MathematicsSubscribe to Flatiron Institute announcements and other foundation updates