Software

aLENS

This is the simulation tool for tracking assemblies of microtubules driven by motor proteins. Microtubules are modeled as straight rigid fibers and motor proteins are modeled as Hookean-springs with active binding-unbinding dynamics governed by a two-stage kinetic Monte-Carlo model to impose detailed balance. A state-of-the-art collision resolution algorithm based on geometrically constrained optimization is incorporated to reach long physical dynamics timescales. This package is massively parallel.

View Project

Astrometry.net

We are building an “astrometry engine” to create correct, standards-compliant astrometric meta data for every useful astronomical image ever taken, past and future, in any state of archival disarray. The astrometry engine will take any image and return the astrometry world coordinate system (WCS)—ie, a standards-based description of the (usually nonlinear) transformation between image coordinates and sky coordinates—with absolutely no “false positives” (but maybe some “no answers”). It will do its best, even when the input image has no—or totally incorrect—meta-data. We intend to install the engine for real-time operation on the web, at observatories, at plate-scanning projects, and at data archives.

View Project

Auxiliary-Field Quantum Monte Carlo

The AFQMC library aims to provide a comprehensive software environment using auxiliary-field quantum Monte Carlo, broadly defined, to study quantum many-fermion systems.

The library will contain a suite of software for
1. correlated electron models and ultracold fermionic atom systems
2. molecular calculations treating standard quantum chemistry Hamiltonians (for example, using Gaussian basis sets)
3. ab initio calculations in extended systems with planewaves and pseudopotentials

View Project

BridgeStan

BridgeStan provides efficient in-memory access to a Stan model through Python, R, Julia, Rust, and C. Stan is a differentiable probabilistic programming language for coding Bayesian statistical models and their posterior predictive inferences. BridgeStan provides access to log densities and their gradients and Hessians, variable names and sizes, constraining and unconstraining parameter transforms, and predictive simulation.

View Project

CaImAn Python

Recent advances in calcium imaging acquisition techniques are creating datasets of the order of Terabytes/week. Memory and computationally efficient algorithms are required to analyze in reasonable amount of time terabytes of data. This project implements a set of essential methods required in the calcium imaging movies analysis pipeline. Fast and scalable algorithms are implemented for motion correction, movie manipulation, and source and spike extraction. CaImAn also contains some routines for the analyisis of behavior from video cameras. In summary, CaImAn provides a general purpose tool to handle large movies, with special emphasis on tools for two-photon and one-photon calcium imaging and behavioral datasets.

View Project

CaImAn Python

Recent advances in calcium imaging acquisition techniques are creating datasets of the order of Terabytes/week. Memory and computationally efficient algorithms are required to analyze in reasonable amount of time terabytes of data. This project implements a set of essential methods required in the calcium imaging movies analysis pipeline. Fast and scalable algorithms are implemented for motion correction, movie manipulation, and source and spike extraction. CaImAn also contains some routines for the analyisis of behavior from video cameras. In summary, CaImAn provides a general purpose tool to handle large movies, with special emphasis on tools for two-photon and one-photon calcium imaging and behavioral datasets.

View Project

celerite

celerite is a library for fast and scalable Gaussian Process (GP) Regression in one dimension with implementations in C++, Python, and Julia. The Python implementation is the most stable and it exposes the most features but it relies on the C++ implementation for computational efficiency. This documentation won’t teach you the fundamentals of GP modeling but the best resource for learning about this is available for free online: Rasmussen & Williams (2006).

View Project

cuFINUFFT

A GPU implementation of the 2- and 3-dimensional non-uniform FFT of types 1 and 2, in single and double precisions, based on the CPU code FINUFFT.

View Project

DAFT

Daft is a Python package that uses matplotlib to render pixel-perfect probabilistic graphical models for publication in a journal or on the internet. With a short Python script and an intuitive model-building syntax you can design directed (Bayesian Networks, directed acyclic graphs) and undirected (Markov random fields) models and save them in any formats that matplotlib supports (including PDF, PNG, EPS and SVG).

View Project

DeepSEA

DeepSEA is a deep learning-based algorithmic framework for predicting the chromatin effects of sequence alterations with single nucleotide sensitivity. DeepSEA can accurately predict the epigenetic state of a sequence, including transcription factors binding, DNase I sensitivities and histone marks in multiple cell types, and further utilize this capability to predict the chromatin effects of sequence variants and prioritize regulatory variants.

View Project

emcee

emcee is an MIT licensed pure-Python implementation of Goodman & Weare’s Affine Invariant Markov chain Monte Carlo (MCMC) Ensemble sampler and these pages will show you how to use it. This documentation won’t teach you too much about MCMC but there are a lot of resources available for that (try this one). We also published a paper explaining the emcee algorithm and implementation in detail.

View Project

EXP

The EXP C++ library is an efficient N-body simulation toolkit that implements basis-function methods using hybrid CPU and GPU code alongside Python bindings.

View Project

Fast sinc transform library

Fast sinc transform libraries which compute sums of the sinc and sinc2 kernels between N arbitrary points in 1, 2, or 3 dimensions. This has applications in MRI and band-limited function approximation. The naive cost is O(N2) whereas our algorithm is quasi-linear in N. Written by our 2017 summer intern Hannah Lawrence.

View Project

FFT-accelerated Interpolation-based t-SNE

FFT-accelerated interpolation-based t-SNE (FIt-SNE) is an efficient implementation of t-SNE (stochastic neighborhood embedding) for dimensionality reduction and visualization of high dimensional datasets. This code is able to perform 1000 iterations of t-SNE on one million data points in under 2 minutes on a desktop, which is many times faster than any other existing code. Written by Manas Rachh with collaborators at Yale.

View Project

Figurl

Figurl lets you use Python to generate shareable figURLs (permalinks) to interactive visualizations. With minimal configuration, these can be generated from any computer with access to the internet. Data objects required for the visualization are stored in kachery-cloud and are referenced by content-address strings. Domain-specific visualization plugins are also stored in the cloud and are developed using ReactJS. The central website, figurl.org, pairs the visualization plugin with the data object to create the shareable interactive views.

View Project

FINUFFT

FINUFFT is a set of libraries to compute efficiently three types of nonuniform fast Fourier transform (NUFFT) to a specified precision, in one, two, or three dimensions, on a multi-core shared-memory machine. The library has a very simple interface, does not need any precomputation step, is written in C++ (using OpenMP and FFTW), and has wrappers to C, fortran, MATLAB, octave, and python. As an example, given M arbitrary real numbers xj and complex numbers cj, with j=1,…,M, and a requested integer number of modes N, the 1D type-1 (aka “adjoint”) transform evaluates the N numbers.

View Project

FMM3D

FMM3D is a set of libraries to compute N-body interactions governed by the Laplace and Helmholtz equations, to a specified precision, in three dimensions, on a multi-core shared-memory machine. The 3D fast multipole method evaluates potentials (and gradients, etc) at a large number of targets due to a large number of sources, in linear or quasi-linear time. Our implementation exploits efficient plane wave expansions, SIMD-accelerated kernel evaluations, and multi-threading.

View Project

FNTM

Tissue-specific Interactions: FNTM leverages a tissue-specific gold standard to automatically up-weight datasets relevant to a tissue from a large data compendium of diverse tissues and cell-types. The resulting functional networks accurately capture tissue-specific functional interactions.
Multi-tissue Analysis: Beyond questions pertaining to the role of single genes in single tissues, FNTM also enables examination of changes in gene function across tissues on a broad scale. Users can compare a gene’s functional interaction in different tissues by selecting the relevant tissues in the dropdown menu.

View Project

Gala

Galactic Dynamics is the study of the formation, history, and evolution of galaxies using the orbits of objects — numerically-integrated trajectories of stars, dark matter particles, star clusters, or galaxies themselves. Gala is an Astropy-affiliated Python package that aims to provide efficient tools for performing common tasks needed in Galactic Dynamics research. Much of this code uses Python for flexible, user-friendly interfaces that interact with wrappers around low-level code (primarily C) to enable fast computations. Common operations include gravitational potential and force evaluations, orbit integrations, dynamical coordinate transformations, and computing chaos indicators for nonlinear dynamics. Gala heavily uses the units and astronomical coordinate systems defined in the Astropy core package.

View Project

George

George is a fast and flexible Python library for Gaussian Process Regression. It capitalizes on the Hierarchical Off-Diagonal Low-Rank formalism to make controlled approximations for fast execution.

View Project

GIANT

Tissue-specific Interactions: GIANT leverages a tissue-specific gold standard to automatically up-weight datasets relevant to a tissue from a large data compendium of diverse tissues and cell-types. The resulting functional networks accurately capture tissue-specific functional interactions.
Multi-tissue Analysis: Beyond questions pertaining to the role of single genes in single tissues, GIANT also enables examination of changes in gene function across tissues on a broad scale. Users can compare a gene’s functional interaction in different tissues by selecting the relevant tissues in the dropdown menu.
NetWAS Analysis: GIANT can effectively reprioritize functional associations from a genome-wide association study (GWAS) and potentially identify additional disease-associated genes. The approach, named NetWAS, can be applied to any GWAS study, and does not require that the phenotype or disease have any known associated genes.

View Project

humanbase

HumanBase applies machine learning algorithms to learn biological associations from massive genomic data collections. These integrative analyses reach beyond existing “biological knowledge” represented in the literature to identify novel, data-driven associations.

View Project

IMP 2.0

Analyze your experimental results in the functional context of gene-gene networks from multiple organisms. Use IMP to direct additional functional experiments by identifying novel gene participants in a pathway or additional processes that a gene of interest participates in.

View Project

IronClust

IronClust is a fast and drift-resistant spike sorting pipeline. The accuracy of spike sorting is validated by multiple ground-truth datasets from a number of contributing labs. IronClust can take advantage of GPU or a compute cluster if available. IronClust requires Matlab with image, parallel, and signal processing toolboxes. IronClust supports Windows, Mac, and Linux.

View Project

ISO-SPLIT

ISO-SPLIT is an efficient clustering algorithm that handles an unknown number of unimodal clusters in low to moderate dimension, without any user-adjustable parameters. It is based on repeated tests for unimodality—using isotonic regression and a modified Hartigan dip test—applied to 1D projections of pairs of putative clusters. It handles well non-Gaussian clusters of widely varying densities and populations, and in such settings has been shown to outperform K-means variants, Gaussian mixture models, and density-based methods.
This repository contains an efficient single-threaded implementation in C++, with a MATLAB/MEX interface.
It was invented and coded by Jeremy Magland, with contributions to the algorithm and tests by Alex Barnett, at SCDA/Flatiron Institute.

View Project

ITensor

The ITensor library supports the productive development of robust and efficient software based on tensor networks. The innovative design of the ITensor library lets users focus on the connectivity of tensor networks instead of lower-level considerations, and is modeled on tensor diagram notation. ITensor has a multi-layered design including basic dense tensors; sparse tensor types; sophisticated handling of quantum number symmetries; and high-level algorithms for matrix product states.
In addition to providing ongoing support for ITensor and its users, there is a major effort at CCQ to continue expanding the scope of ITensor to make handling complex quantum models easier and to continue incorporating the latest algorithmic developments and tensor network formats.

View Project

Kachery-cloud

Kachery-cloud is a network for sharing scientific data files, live feeds, mutable data and calculation results between lab computers and browser-based user interfaces. Resources are organized into projects which are accessed via registered Python clients. Using simple Python commands you can store files, data objects, mutables or live feeds, and then retrieve or access these on a remote machine (or in a browser via JavaScript) by referencing universal URI strings. In the case of static content, URIs are essentially content hashes, thus forming a content-addressable storage database. While the primary purpose of kachery-cloud at this time is to support figurl, it can also be used independently in collaborative scientific research workflows and for improving scientific reproducibility and dissemination.

View Project

KNNimpute

KNNimpute is an implementation of the k-nearest neighbors algorithm for estimation of missing values in microarray data. In our comparative study of several different methods used for missing value estimation we determined that KNNimpute provides superior performance in a variety of situations.

View Project

Masala

Despite its power and versatility, the Rosetta software suite struggles to take full advantage of modern computing hardware. Masala is a set of C++ libraries under development in the Biomolecular Design Group that are intended to replace much of the functionality of Rosetta, and to harness modern massively parallel computing hardware (CPUs, GPUs or FPGAs), or current and near-future generation quantum computers (quantum annealers or gate-based systems) for NP-hard optimization problems. Masala’s versatile plugin system will also permit third parties to develop specialized molecular modelling protocols. Intended for use either as a support library for other software (such as Rosetta) or as a standalone set of applications, we plan to release Masala as free and open-source software in the near future.

View Project

MESA

MESA is a suite of open-source, robust, efficient, thread-safe libraries extensively used in computational stellar astrophysics. Its wide-ranging capabilities allow the simulation of diverse stellar evolution scenarios, from low-mass to massive stars, including advanced evolutionary stages and binary interactions. It uses adaptive mesh refinement and sophisticated timestep controls and supports shared memory parallelism based on OpenMP. State-of-the-art modules provide equations of state, opacity, nuclear reaction rates, element diffusion data, and atmosphere boundary conditions. Each module is constructed as a separate Fortran 95 library with its own explicitly defined public interface to facilitate independent development.

View Project

MountainSort/MountainLab

MountainSort is spike sorting software developed by Jeremy Magland, Alex Barnett, and Leslie Greengard at the Center for Computational Biology, Flatiron Institute in close collaboration with Jason Chung and Loren Frank at UCSF department of Physiology. MountainSort is a plugin package to MountainLab, a general framework for scientific data analysis, sharing, and visualization.

MountainLab is data processing, sharing and visualization software for scientists. It is built around MountainSort, a spike sorting algorithm, but is designed to more generally applicable.

View Project

Nano-Dissection

This server performs in silico nano-dissection, an approach we developed to identify genes with novel cell-lineage specific expression. This method leverages high-throughput functional genomics data from tissue homogenates to accurately predict genes enriched in specific cell types.

View Project

NeMoS

A statistical modeling framework for systems neuroscience. NeMos specializes in GPU-accelerated optimizations. Its current core functionality includes the implementation of the Generalized Linear Model (GLM) for spike train and calcium imaging analysis.

View Project

NetKet

NetKet is an open-source project delivering advanced methods for the study of many-body quantum systems with artificial neural networks and machine learning techniques. It is the first open-source platform supporting collaborative developments in the field and aims to be a robust yet highly responsive reference implementation for both consolidated and new, more experimental, techniques.
One of the main features of this software is the ability to find the ground state of interacting Hamiltonians using neural network–based ansatz states for the many-body wave function. Because of the modular infrastructure of the library, it is possible to highly customize most of its components. For example, changing Hamiltonians, observables and other problem-dependent quantities is meant to be easy and does not require an in-depth knowledge of programming languages.
To stimulate a large-scale conceptual and practical development of the software, NetKet has introduced a series of “Challenges” tasks, to be tackled by researchers worldwide.

View Project

Octopus

Octopus is quantum mechanics package aimed at the ab initio virtual experimentation on an ever-increasing range of system types that enables a reliable and accurate simulation of light-induced changes in the physical and chemical properties of complex systems. The first principles real-space-based Octopus project provides an unique framework to describe non-equilibrium phenomena in molecular complexes, low dimensional materials, and extended systems by accounting for electronic, ionic, and photon quantum mechanical effects within a generalized time-dependent density functional theory framework (TDDFT) and beyond. The Octopus package enables the simulation and characterization spatial and time resolved spectroscopies, ultrafast phenomena in molecules and materials, and new emergent states of mater (QED-materials and QED-chemistry).

View Project

plenoptic

`plenoptic` is a python library for model-based stimulus synthesis. It provides tools to help researchers understand their model by synthesizing novel informative stimuli, which help build intuition for what features the model ignores and what it is sensitive to. These synthetic images can then be used in future perceptual or neural experiments for further investigation.

View Project

Pyia

Pyia is a Python package for interacting and working with data from the Gaia Mission.

View Project

PYthon Neural Analysis Package (Pynapple)

Pynapple is a light-weight python library for neurophysiological data analysis. The goal is to offer a versatile set of tools to study typical data in the field, i.e. time series (spike times, behavioral events, etc.) and time intervals (trials, brain states, etc.). It also provides users with generic functions for neuroscience such as tuning curves and cross-correlograms.

View Project

RealNeuralNetworks.jl

Due to the string-like nature of neurons and blood vessels, they could be abstracted as curved tubes with center lines and radii. This representation could be used for morphological analysis, such as path length and branching angle. Given an accurate voxel segmentation, the computation of object centerlines and radii is called skeletonization. RealNeuralNetworks.jl is developed to do that. Unlike most related packages, it combines the synaptic connectivity graph with morphological features and could be used to explore the relationship between synaptic connectivity and morphology. Recently, a new arising programing language, called Julia, is getting popular in data science. RealNeuralNetworks.jl is a Julia package and the algorithms are written from scratch for less dependency and efficiency.

View Project

Riccati

Riccati is an efficient numerical solver developed for a class of ordinary differential equations whose solution may exhibit extremely quick oscillations. Standard routines available from scientific computing libraries typically struggle with these types of equations: their runtime grows as the oscillation frequency. Riccati can achieve a frequency-independent (constant) runtime. The package is written in Python, complete with documentation, tests, and interactive examples. It implements the robust algorithm described in Agocs & Barnett (2022), which is able to switch between two different numerical methods on-the-fly, adapting to the behavior of the solution, as well as choose its own stepsize and other parameters to achieve a user-specified accuracy.

View Project

Rosetta

For over twenty years, the Rosetta software suite has been one of the foremost software packages for modeling, docking, designing, and predicting structures of proteins built from the 20 canonical amino acid building-blocks. Our work has expanded the palette of building-blocks that can be accurately modelled in Rosetta and has allowed us to produce mixed-chirality peptide macrocycles targeting a number of proteins of medicinal relevance.

View Project

RosettaQM

A major limitation of conventional macromolecular modelling approaches is the use of force fields to compute energies. Although force fields are fast, they are highly approximate, and must be tuned and calibrated for a particular set of chemical building-blocks. Quantum chemistry methods offer far greater accuracy and generality. Long thought to be too slow for macromolecular modelling applications, recent work in semi-empirical methods and in fragment molecular orbital methods has brought quantum chemistry calculations performed on large macromolecules into the realm of computational tractability. RosettaQM is a project started in the Biomolecular Design Group that permits Rosetta to carry out quantum mechanical calculations in the context of any design, docking or structure prediction protocol. Development is ongoing.

View Project

SEEK

SEEK is a computational gene co-expression search engine. SEEK provides biologists with a way to navigate the massive human expression compendium that now contains thousands of expression datasets. SEEK returns a robust ranking of co-expressed genes in the biological area of interest defined by the user’s query genes. In the meantime, it also prioritizes thousands of expression datasets according to the user’s query of interest. The unique strengths of SEEK include its support for multi-gene query and cross-platform analysis, as well as its rich visualization features.

View Project

SkellySim

SkellySim is a simulation package for simulating cellular components such as flexible filaments, motor proteins, and arbitrary rigid bodies. It’s designed to be highly scalable, capable of both OpenMP and MPI style parallelism, while using the efficient STKFMM/PVFMM libraries for hydrodynamic resolution.

View Project

Sleipnir

Sleipnir is a C++ library enabling efficient analysis, integration, mining, and machine learning over genomic data. This includes a particular focus on microarrays, since they make up the bulk of available data for many organisms, but Sleipnir can also integrate a wide variety of other data types, from pairwise physical interactions to sequence similarity or shared transcription factor binding sites. All analysis is done with attention to speed and memory usage, enabling the integration of hundreds of datasets covering tens of thousands of genes. In addition to the core library, Sleipnir comes with a variety of pre-made tools, providing solutions to common data processing tasks and examples to help you use Sleipnir in your own programs. Sleipnir is free, open source, fully documented, and ready to be used by itself or as a component in your computational biology analyses.

View Project

SpikeForest

SpikeForest is a reproducible, continuously updating platform which benchmarks the performance of spike sorting codes across a large curated database of electrophysiological recordings with ground truth. It consists of this website for presenting our up-to-date findings, a Python package which contains the tools for running the SpikeForest analysis, and an expanding collection of electrophysiology recordings with ground-truth spiking information.

View Project

Stan

Stan is an open-source platform for statistical modeling and high-performance statistical computation. Users rely on Stan for statistical modeling, data analysis and prediction in the social, biological and physical sciences, engineering, business, medicine, finance, education and sports. Users specify log density functions in Stan’s probabilistic programming language and get full Bayesian statistical inference with MCMC sampling (NUTS, HMC), approximate Bayesian inference with variational inference (ADVI), and penalized maximum likelihood estimation with optimization and Laplace approximation (L-BFGS).Stan’s math library provides differentiable real and complex special functions, probability functions and linear algebra using reverse- and forward-mode automatic differentiation. Stan has interfaces supporting statistical workflow in R, Python and Julia.

View Project

STARRY

starry enables the computation of fast and precise light curves for various applications in astronomy: transits and secondary eclipses of exoplanets, light curves of eclipsing binaries, rotational phase curves of exoplanets, light curves of planet-planet and planet-moon occultations, and more.

View Project

STKFMM

This is a numerical computation package for various single- and double-layer kernels for Laplace and Stokes operators in boundary integral methods, implemented on top of the highly-optimized kernel independent fast-multipole method package PVFMM. All kernel evaluations are hand-optimized by AVX2 SIMD instructions to maximize the speed of evaluations.

View Project

TRIQS

The TRIQS (Toolbox for Research in Interacting Quantum Systems) project provides a Python/C++ library of basic components to implement the cutting-edge algorithms in the Quantum Many-Body problem, with a focus on Quantum Embedding and Quantum Monte Carlo methods. The project also includes a series of applications, e.g. state-of-the-art quantum impurity solvers, and an interface to electronic structure codes for DFT+DMFT computations.

View Project

URSA(HD)

This is now the home of URSA and URSA(HD). Leveraging gene expression profiles of thousands of tissue and disease samples, URSA and URSA(HD) identify distinct molecular signatures of individual tissues and diseases. Submit your gene expression profile to use these molecular signatures and ascertain the tissue and disease signal in your data.

View Project