We are building an “astrometry engine” to create correct, standards-compliant astrometric meta data for every useful astronomical image ever taken, past and future, in any state of archival disarray. The astrometry engine will take any image and return the astrometry world coordinate system (WCS)—ie, a standards-based description of the (usually nonlinear) transformation between image coordinates and sky coordinates—with absolutely no “false positives” (but maybe some “no answers”). It will do its best, even when the input image has no—or totally incorrect—meta-data. We intend to install the engine for real-time operation on the web, at observatories, at plate-scanning projects, and at data archives.
The AFQMC library aims to provide a comprehensive software environment using auxiliary-field quantum Monte Carlo, broadly defined, to study quantum many-fermion systems.
The library will contain a suite of software for
1. correlated electron models and ultracold fermionic atom systems
2. molecular calculations treating standard quantum chemistry Hamiltonians (for example, using Gaussian basis sets)
3. ab initio calculations in extended systems with planewaves and pseudopotentials
Recent advances in calcium imaging acquisition techniques are creating datasets of the order of Terabytes/week. Memory and computationally efficient algorithms are required to analyze in reasonable amount of time terabytes of data. This project implements a set of essential methods required in the calcium imaging movies analysis pipeline. Fast and scalable algorithms are implemented for motion correction, movie manipulation, and source and spike extraction. CaImAn also contains some routines for the analyisis of behavior from video cameras. In summary, CaImAn provides a general purpose tool to handle large movies, with special emphasis on tools for two-photon and one-photon calcium imaging and behavioral datasets.
A Computational toolbox for large scale Calcium Imaging data Analysis. The code implements the CNMF algorithm for simultaneous source extraction and spike inference from large scale calcium imaging movies. Many more features are included. The code is suitable for the analysis of somatic imaging data. Improved implementation for the analysis of dendritic/axonal imaging data will be added in the future.
celerite is a library for fast and scalable Gaussian Process (GP) Regression in one dimension with implementations in C++, Python, and Julia. The Python implementation is the most stable and it exposes the most features but it relies on the C++ implementation for computational efficiency. This documentation won’t teach you the fundamentals of GP modeling but the best resource for learning about this is available for free online: Rasmussen & Williams (2006).
A GPU implementation of the 2- and 3-dimensional non-uniform FFT of types 1 and 2, in single and double precisions, based on the CPU code FINUFFT.
Daft is a Python package that uses matplotlib to render pixel-perfect probabilistic graphical models for publication in a journal or on the internet. With a short Python script and an intuitive model-building syntax you can design directed (Bayesian Networks, directed acyclic graphs) and undirected (Markov random fields) models and save them in any formats that matplotlib supports (including PDF, PNG, EPS and SVG).
DeepSEA is a deep learning-based algorithmic framework for predicting the chromatin effects of sequence alterations with single nucleotide sensitivity. DeepSEA can accurately predict the epigenetic state of a sequence, including transcription factors binding, DNase I sensitivities and histone marks in multiple cell types, and further utilize this capability to predict the chromatin effects of sequence variants and prioritize regulatory variants.
emcee is an MIT licensed pure-Python implementation of Goodman & Weare’s Affine Invariant Markov chain Monte Carlo (MCMC) Ensemble sampler and these pages will show you how to use it. This documentation won’t teach you too much about MCMC but there are a lot of resources available for that (try this one). We also published a paper explaining the emcee algorithm and implementation in detail.
Fast sinc transform libraries which compute sums of the sinc and sinc2 kernels between N arbitrary points in 1, 2, or 3 dimensions. This has applications in MRI and band-limited function approximation. The naive cost is O(N2) whereas our algorithm is quasi-linear in N. Written by our 2017 summer intern Hannah Lawrence.
FFT-accelerated interpolation-based t-SNE (FIt-SNE) is an efficient implementation of t-SNE (stochastic neighborhood embedding) for dimensionality reduction and visualization of high dimensional datasets. This code is able to perform 1000 iterations of t-SNE on one million data points in under 2 minutes on a desktop, which is many times faster than any other existing code. Written by Manas Rachh with collaborators at Yale.
FINUFFT is a set of libraries to compute efficiently three types of nonuniform fast Fourier transform (NUFFT) to a specified precision, in one, two, or three dimensions, on a multi-core shared-memory machine. The library has a very simple interface, does not need any precomputation step, is written in C++ (using OpenMP and FFTW), and has wrappers to C, fortran, MATLAB, octave, and python. As an example, given M arbitrary real numbers xj and complex numbers cj, with j=1,…,M, and a requested integer number of modes N, the 1D type-1 (aka “adjoint”) transform evaluates the N numbers.
FMM3D is a set of libraries to compute N-body interactions governed by the Laplace and Helmholtz equations, to a specified precision, in three dimensions, on a multi-core shared-memory machine. The 3D fast multipole method evaluates potentials (and gradients, etc) at a large number of targets due to a large number of sources, in linear or quasi-linear time. Our implementation exploits efficient plane wave expansions, SIMD-accelerated kernel evaluations, and multi-threading.
Tissue-specific Interactions: FNTM leverages a tissue-specific gold standard to automatically up-weight datasets relevant to a tissue from a large data compendium of diverse tissues and cell-types. The resulting functional networks accurately capture tissue-specific functional interactions.
Multi-tissue Analysis: Beyond questions pertaining to the role of single genes in single tissues, FNTM also enables examination of changes in gene function across tissues on a broad scale. Users can compare a gene’s functional interaction in different tissues by selecting the relevant tissues in the dropdown menu.
George is a fast and flexible Python library for Gaussian Process Regression. It capitalizes on the Hierarchical Off-Diagonal Low-Rank formalism to make controlled approximations for fast execution.
Tissue-specific Interactions: GIANT leverages a tissue-specific gold standard to automatically up-weight datasets relevant to a tissue from a large data compendium of diverse tissues and cell-types. The resulting functional networks accurately capture tissue-specific functional interactions.
Multi-tissue Analysis: Beyond questions pertaining to the role of single genes in single tissues, GIANT also enables examination of changes in gene function across tissues on a broad scale. Users can compare a gene’s functional interaction in different tissues by selecting the relevant tissues in the dropdown menu.
NetWAS Analysis: GIANT can effectively reprioritize functional associations from a genome-wide association study (GWAS) and potentially identify additional disease-associated genes. The approach, named NetWAS, can be applied to any GWAS study, and does not require that the phenotype or disease have any known associated genes.
HumanBase applies machine learning algorithms to learn biological associations from massive genomic data collections. These integrative analyses reach beyond existing “biological knowledge” represented in the literature to identify novel, data-driven associations.
Analyze your experimental results in the functional context of gene-gene networks from multiple organisms. Use IMP to direct additional functional experiments by identifying novel gene participants in a pathway or additional processes that a gene of interest participates in.
IronClust is a fast and drift-resistant spike sorting pipeline. The accuracy of spike sorting is validated by multiple ground-truth datasets from a number of contributing labs. IronClust can take advantage of GPU or a compute cluster if available. IronClust requires Matlab with image, parallel, and signal processing toolboxes. IronClust supports Windows, Mac, and Linux.
ISO-SPLIT is an efficient clustering algorithm that handles an unknown number of unimodal clusters in low to moderate dimension, without any user-adjustable parameters. It is based on repeated tests for unimodality—using isotonic regression and a modified Hartigan dip test—applied to 1D projections of pairs of putative clusters. It handles well non-Gaussian clusters of widely varying densities and populations, and in such settings has been shown to outperform K-means variants, Gaussian mixture models, and density-based methods.
This repository contains an efficient single-threaded implementation in C++, with a MATLAB/MEX interface.
It was invented and coded by Jeremy Magland, with contributions to the algorithm and tests by Alex Barnett, at SCDA/Flatiron Institute.
The ITensor library supports the productive development of robust and efficient software based on tensor networks. The innovative design of the ITensor library lets users focus on the connectivity of tensor networks instead of lower-level considerations, and is modeled on tensor diagram notation. ITensor has a multi-layered design including basic dense tensors; sparse tensor types; sophisticated handling of quantum number symmetries; and high-level algorithms for matrix product states.
In addition to providing ongoing support for ITensor and its users, there is a major effort at CCQ to continue expanding the scope of ITensor to make handling complex quantum models easier and to continue incorporating the latest algorithmic developments and tensor network formats.
KNNimpute is an implementation of the k-nearest neighbors algorithm for estimation of missing values in microarray data. In our comparative study of several different methods used for missing value estimation we determined that KNNimpute provides superior performance in a variety of situations.
MountainSort is spike sorting software developed by Jeremy Magland, Alex Barnett, and Leslie Greengard at the Center for Computational Biology, Flatiron Institute in close collaboration with Jason Chung and Loren Frank at UCSF department of Physiology. MountainSort is a plugin package to MountainLab, a general framework for scientific data analysis, sharing, and visualization.
MountainLab is data processing, sharing and visualization software for scientists. It is built around MountainSort, a spike sorting algorithm, but is designed to more generally applicable.
This server performs in silico nano-dissection, an approach we developed to identify genes with novel cell-lineage specific expression. This method leverages high-throughput functional genomics data from tissue homogenates to accurately predict genes enriched in specific cell types.
NetKet is an open-source project delivering advanced methods for the study of many-body quantum systems with artificial neural networks and machine learning techniques. It is the first open-source platform supporting collaborative developments in the field and aims to be a robust yet highly responsive reference implementation for both consolidated and new, more experimental, techniques.
One of the main features of this software is the ability to find the ground state of interacting Hamiltonians using neural network–based ansatz states for the many-body wave function. Because of the modular infrastructure of the library, it is possible to highly customize most of its components. For example, changing Hamiltonians, observables and other problem-dependent quantities is meant to be easy and does not require an in-depth knowledge of programming languages.
To stimulate a large-scale conceptual and practical development of the software, NetKet has introduced a series of “Challenges” tasks, to be tackled by researchers worldwide.
SEEK is a computational gene co-expression search engine. SEEK provides biologists with a way to navigate the massive human expression compendium that now contains thousands of expression datasets. SEEK returns a robust ranking of co-expressed genes in the biological area of interest defined by the user’s query genes. In the meantime, it also prioritizes thousands of expression datasets according to the user’s query of interest. The unique strengths of SEEK include its support for multi-gene query and cross-platform analysis, as well as its rich visualization features.
Sleipnir is a C++ library enabling efficient analysis, integration, mining, and machine learning over genomic data. This includes a particular focus on microarrays, since they make up the bulk of available data for many organisms, but Sleipnir can also integrate a wide variety of other data types, from pairwise physical interactions to sequence similarity or shared transcription factor binding sites. All analysis is done with attention to speed and memory usage, enabling the integration of hundreds of datasets covering tens of thousands of genes. In addition to the core library, Sleipnir comes with a variety of pre-made tools, providing solutions to common data processing tasks and examples to help you use Sleipnir in your own programs. Sleipnir is free, open source, fully documented, and ready to be used by itself or as a component in your computational biology analyses.
SpikeForest is a reproducible, continuously updating platform which benchmarks the performance of spike sorting codes across a large curated database of electrophysiological recordings with ground truth. It consists of this website for presenting our up-to-date findings, a Python package which contains the tools for running the SpikeForest analysis, and an expanding collection of electrophysiology recordings with ground-truth spiking information.
starry enables the computation of fast and precise light curves for various applications in astronomy: transits and secondary eclipses of exoplanets, light curves of eclipsing binaries, rotational phase curves of exoplanets, light curves of planet-planet and planet-moon occultations, and more.
The TRIQS (Toolbox for Research in Interacting Quantum Systems) project provides a Python/C++ library of basic components to implement the cutting-edge algorithms in the Quantum Many-Body problem, with a focus on Quantum Embedding and Quantum Monte Carlo methods. The project also includes a series of applications, e.g. state-of-the-art quantum impurity solvers, and an interface to electronic structure codes for DFT+DMFT computations.
This is now the home of URSA and URSA(HD). Leveraging gene expression profiles of thousands of tissue and disease samples, URSA and URSA(HD) identify distinct molecular signatures of individual tissues and diseases. Submit your gene expression profile to use these molecular signatures and ascertain the tissue and disease signal in your data.