Flatiron Software

Project Image for Auxiliary-Field Quantum Monte Carlo
Auxiliary-Field Quantum Monte Carlo

The AFQMC library aims to provide a comprehensive software environment using auxiliary-field quantum Monte Carlo, broadly defined, to study quantum many-fermion systems.

The library will contain a suite of software for
1. correlated electron models and ultracold fermionic atom systems
2. molecular calculations treating standard quantum chemistry Hamiltonians (for example, using Gaussian basis sets)
3. ab initio calculations in extended systems with planewaves and pseudopotentials

View Project
Project Image for NetKet

NetKet is an open-source project delivering advanced methods for the study of many-body quantum systems with artificial neural networks and machine learning techniques. It is the first open-source platform supporting collaborative developments in the field and aims to be a robust yet highly responsive reference implementation for both consolidated and new, more experimental, techniques.

One of the main features of this software is the ability to find the ground state of interacting Hamiltonians using neural network–based ansatz states for the many-body wave function. Because of the modular infrastructure of the library, it is possible to highly customize most of its components. For example, changing Hamiltonians, observables and other problem-dependent quantities is meant to be easy and does not require an in-depth knowledge of programming languages.

To stimulate a large-scale conceptual and practical development of the software, NetKet has introduced a series of “Challenges” tasks, to be tackled by researchers worldwide.

View Project
Project Image for ITensor

The ITensor library supports the productive development of robust and efficient software based on tensor networks. The innovative design of the ITensor library lets users focus on the connectivity of tensor networks instead of lower-level considerations, and is modeled on tensor diagram notation. ITensor has a multi-layered design including basic dense tensors; sparse tensor types; sophisticated handling of quantum number symmetries; and high-level algorithms for matrix product states.

In addition to providing ongoing support for ITensor and its users, there is a major effort at CCQ to continue expanding the scope of ITensor to make handling complex quantum models easier and to continue incorporating the latest algorithmic developments and tensor network formats.

View Project
Project Image for TRIQS

The TRIQS (Toolbox for Research in Interacting Quantum Systems) project provides a Python/C++ library of basic components to implement the cutting-edge algorithms in the Quantum Many-Body problem, with a focus on Quantum Embedding and Quantum Monte Carlo methods. The project also includes a series of applications, e.g. state-of-the-art quantum impurity solvers, and an interface to electronic structure codes for DFT+DMFT computations.

View Project
Project Image for FINUFFT

FINUFFT is a set of libraries to compute efficiently three types of nonuniform fast Fourier transform (NUFFT) to a specified precision, in one, two, or three dimensions, on a multi-core shared-memory machine. The library has a very simple interface, does not need any precomputation step, is written in C++ (using OpenMP and FFTW), and has wrappers to C, fortran, MATLAB, octave, and python. As an example, given M arbitrary real numbers xj and complex numbers cj, with j=1,…,M, and a requested integer number of modes N, the 1D type-1 (aka “adjoint”) transform evaluates the N numbers.

View Project
Project Image for MountainSort/MountainLab

MountainSort is spike sorting software developed by Jeremy Magland, Alex Barnett, and Leslie Greengard at the Center for Computational Biology, Flatiron Institute in close collaboration with Jason Chung and Loren Frank at UCSF department of Physiology. MountainSort is a plugin package to MountainLab, a general framework for scientific data analysis, sharing, and visualization.

MountainLab is data processing, sharing and visualization software for scientists. It is built around MountainSort, a spike sorting algorithm, but is designed to more generally applicable.

View Project
Project Image for ISO-SPLIT

ISO-SPLIT is an efficient clustering algorithm that handles an unknown number of unimodal clusters in low to moderate dimension, without any user-adjustable parameters. It is based on repeated tests for unimodality—using isotonic regression and a modified Hartigan dip test—applied to 1D projections of pairs of putative clusters. It handles well non-Gaussian clusters of widely varying densities and populations, and in such settings has been shown to outperform K-means variants, Gaussian mixture models, and density-based methods.

This repository contains an efficient single-threaded implementation in C++, with a MATLAB/MEX interface.

It was invented and coded by Jeremy Magland, with contributions to the algorithm and tests by Alex Barnett, at SCDA/Flatiron Institute.

View Project
Project Image for CaImAn-MATLAB

A Computational toolbox for large scale Calcium Imaging data Analysis. The code implements the CNMF algorithm for simultaneous source extraction and spike inference from large scale calcium imaging movies. Many more features are included. The code is suitable for the analysis of somatic imaging data. Improved implementation for the analysis of dendritic/axonal imaging data will be added in the future.

View Project
Project Image for CaImAn Python
CaImAn Python

Recent advances in calcium imaging acquisition techniques are creating datasets of the order of Terabytes/week. Memory and computationally efficient algorithms are required to analyze in reasonable amount of time terabytes of data. This project implements a set of essential methods required in the calcium imaging movies analysis pipeline. Fast and scalable algorithms are implemented for motion correction, movie manipulation, and source and spike extraction. CaImAn also contains some routines for the analyisis of behavior from video cameras. In summary, CaImAn provides a general purpose tool to handle large movies, with special emphasis on tools for two-photon and one-photon calcium imaging and behavioral datasets.

View Project
Project Image for GIANT

Tissue-specific Interactions
GIANT leverages a tissue-specific gold standard to automatically up-weight datasets relevant to a tissue from a large data compendium of diverse tissues and cell-types. The resulting functional networks accurately capture tissue-specific functional interactions.

Multi-tissue Analysis
Beyond questions pertaining to the role of single genes in single tissues, GIANT also enables examination of changes in gene function across tissues on a broad scale. Users can compare a gene’s functional interaction in different tissues by selecting the relevant tissues in the dropdown menu.

NetWAS Analysis
GIANT can effectively reprioritize functional associations from a genome-wide association study (GWAS) and potentially identify additional disease-associated genes. The approach, named NetWAS, can be applied to any GWAS study, and does not require that the phenotype or disease have any known associated genes.

View Project
Project Image for DeepSEA

DeepSEA is a deep learning-based algorithmic framework for predicting the chromatin effects of sequence alterations with single nucleotide sensitivity. DeepSEA can accurately predict the epigenetic state of a sequence, including transcription factors binding, DNase I sensitivities and histone marks in multiple cell types, and further utilize this capability to predict the chromatin effects of sequence variants and prioritize regulatory variants.

View Project
Project Image for Nano-Dissection

This server performs in silico nano-dissection, an approach we developed to identify genes with novel cell-lineage specific expression. This method leverages high-throughput functional genomics data from tissue homogenates to accurately predict genes enriched in specific cell types.

View Project
Project Image for Sleipnir

Sleipnir is a C++ library enabling efficient analysis, integration, mining, and machine learning over genomic data. This includes a particular focus on microarrays, since they make up the bulk of available data for many organisms, but Sleipnir can also integrate a wide variety of other data types, from pairwise physical interactions to sequence similarity or shared transcription factor binding sites. All analysis is done with attention to speed and memory usage, enabling the integration of hundreds of datasets covering tens of thousands of genes. In addition to the core library, Sleipnir comes with a variety of pre-made tools, providing solutions to common data processing tasks and examples to help you use Sleipnir in your own programs. Sleipnir is free, open source, fully documented, and ready to be used by itself or as a component in your computational biology analyses.

View Project
Project Image for IMP 2.0
IMP 2.0

Analyze your experimental results in the functional context of gene-gene networks from multiple organisms. Use IMP to direct additional functional experiments by identifying novel gene participants in a pathway or additional processes that a gene of interest participates in.

View Project
Project Image for SEEK

SEEK is a computational gene co-expression search engine. SEEK provides biologists with a way to navigate the massive human expression compendium that now contains thousands of expression datasets. SEEK returns a robust ranking of co-expressed genes in the biological area of interest defined by the user’s query genes. In the meantime, it also prioritizes thousands of expression datasets according to the user’s query of interest. The unique strengths of SEEK include its support for multi-gene query and cross-platform analysis, as well as its rich visualization features.

View Project
Project Image for KNNimpute

KNNimpute is an implementation of the k-nearest neighbors algorithm for estimation of missing values in microarray data. In our comparative study of several different methods used for missing value estimation we determined that KNNimpute provides superior performance in a variety of situations.

View Project
Project Image for FNTM

Tissue-specific Interactions
FNTM leverages a tissue-specific gold standard to automatically up-weight datasets relevant to a tissue from a large data compendium of diverse tissues and cell-types. The resulting functional networks accurately capture tissue-specific functional interactions.

Multi-tissue Analysis
Beyond questions pertaining to the role of single genes in single tissues, FNTM also enables examination of changes in gene function across tissues on a broad scale. Users can compare a gene’s functional interaction in different tissues by selecting the relevant tissues in the dropdown menu.

View Project
Project Image for URSA(HD)

This is now the home of URSA and URSA(HD). Leveraging gene expression profiles of thousands of tissue and disease samples, URSA and URSA(HD) identify distinct molecular signatures of individual tissues and diseases. Submit your gene expression profile to use these molecular signatures and ascertain the tissue and disease signal in your data.

View Project
Project Image for humanbase

HumanBase applies machine learning algorithms to learn biological assocations from massive genomic data collections. These integrative analyses reach beyond existing “biological knowledge” represented in the literature to identify novel, data-driven associations.

View Project
Project Image for emcee

emcee is an MIT licensed pure-Python implementation of Goodman & Weare’s Affine Invariant Markov chain Monte Carlo (MCMC) Ensemble sampler and these pages will show you how to use it.

This documentation won’t teach you too much about MCMC but there are a lot of resources available for that (try this one). We also published a paper explaining the emcee algorithm and implementation in detail.

View Project
Project Image for George

George is a fast and flexible Python library for Gaussian Process Regression. It capitalizes on the Hierarchical Off-Diagonal Low-Rank formalism to make controlled approximations for fast execution.

View Project
Project Image for celerite

celerite is a library for fast and scalable Gaussian Process (GP) Regression in one dimension with implementations in C++, Python, and Julia. The Python implementation is the most stable and it exposes the most features but it relies on the C++ implementation for computational efficiency. This documentation won’t teach you the fundamentals of GP modeling but the best resource for learning about this is available for free online: Rasmussen & Williams (2006).

View Project
Project Image for Astrometry.net

We are building an “astrometry engine” to create correct, standards-compliant astrometric meta data for every useful astronomical image ever taken, past and future, in any state of archival disarray.

The astrometry engine will take any image and return the astrometry world coordinate system (WCS)—ie, a standards-based description of the (usually nonlinear) transformation between image coordinates and sky coordinates—with absolutely no “false positives” (but maybe some “no answers”). It will do its best, even when the input image has no—or totally incorrect—meta-data.

We intend to install the engine for real-time operation on the web, at observatories, at plate-scanning projects, and at data archives.

View Project
Project Image for DAFT

Daft is a Python package that uses matplotlib to render pixel-perfect probabilistic graphical models for publication in a journal or on the internet. With a short Python script and an intuitive model-building syntax you can design directed (Bayesian Networks, directed acyclic graphs) and undirected (Markov random fields) models and save them in any formats that matplotlib supports (including PDF, PNG, EPS and SVG).

View Project
Advancing Research in Basic Science and MathematicsSubscribe to Flatiron Institute announcements and other foundation updates