Software

Masala

Despite its power and versatility, the Rosetta software suite struggles to take full advantage of modern computing hardware. Masala is a set of C++ libraries for macromolecular modelling developed by the Biomolecular Design group, which are intended for use either as standalone software, or as a support library for existing software, such as Rosetta. Masala is ultimately intended to replace much of the functionality of Rosetta, and to harness modern parallel computing hardware (CPUs, GPUs or FPGAs), or current and near-future generation quantum computers (quantum annealers or gate-based systems) for NP-hard optimization problems. Its versatile plugin system permits third parties to develop specialized macromolecular modelling tools, and to use these seamlessly with existing macromolecular modelling packages that can link Masala, such as Rosetta. The Standard Masala Plugins library contains several high-efficiency optimizers useful for problems like rotamer optimization, sequence design, or conformational relaxation.

View Project

Rosetta

For over twenty years, the Rosetta software suite has been one of the foremost software packages for modeling, docking, designing, and predicting structures of proteins built from the 20 canonical amino acid building-blocks. Our work has expanded the palette of building-blocks that can be accurately modelled in Rosetta and has allowed us to produce mixed-chirality peptide macrocycles targeting a number of proteins of medicinal relevance.

RosettaQM

A major limitation of conventional macromolecular modelling approaches is the use of force fields to compute energies. Although force fields are fast, they are highly approximate, and must be tuned and calibrated for a particular set of chemical building-blocks. Quantum chemistry methods offer far greater accuracy and generality. Long thought to be too slow for macromolecular modelling applications, recent work in semi-empirical methods and in fragment molecular orbital methods has brought quantum chemistry calculations performed on large macromolecules into the realm of computational tractability. RosettaQM is a project started in the Biomolecular Design Group that permits Rosetta to carry out quantum mechanical calculations in the context of any design, docking or structure prediction protocol. Development is ongoing.

aLENS

This is the simulation tool for tracking assemblies of microtubules driven by motor proteins. Microtubules are modeled as straight rigid fibers and motor proteins are modeled as Hookean-springs with active binding-unbinding dynamics governed by a two-stage kinetic Monte-Carlo model to impose detailed balance.

A state-of-the-art collision resolution algorithm based on geometrically constrained optimization is incorporated to reach long physical dynamics timescales. This package is massively parallel.

View Project

STKFMM

This is a numerical computation package for various single- and double-layer kernels for Laplace and Stokes operators in boundary integral methods, implemented on top of the highly-optimized kernel independent fast-multipole method package PVFMM. All kernel evaluations are hand-optimized by AVX2 SIMD instructions to maximize the speed of evaluations.

View Project

DeepSEA

DeepSEA is a deep learning-based algorithmic framework for predicting the chromatin effects of sequence alterations with single nucleotide sensitivity. DeepSEA can accurately predict the epigenetic state of a sequence, including transcription factors binding, DNase I sensitivities and histone marks in multiple cell types, and further utilize this capability to predict the chromatin effects of sequence variants and prioritize regulatory variants.

View Project

FNTM

Tissue-specific Interactions: FNTM leverages a tissue-specific gold standard to automatically up-weight datasets relevant to a tissue from a large data compendium of diverse tissues and cell-types. The resulting functional networks accurately capture tissue-specific functional interactions.
Multi-tissue Analysis: Beyond questions pertaining to the role of single genes in single tissues, FNTM also enables examination of changes in gene function across tissues on a broad scale. Users can compare a gene’s functional interaction in different tissues by selecting the relevant tissues in the dropdown menu.

View Project

GIANT

Tissue-specific Interactions: GIANT leverages a tissue-specific gold standard to automatically up-weight datasets relevant to a tissue from a large data compendium of diverse tissues and cell-types. The resulting functional networks accurately capture tissue-specific functional interactions.
Multi-tissue Analysis: Beyond questions pertaining to the role of single genes in single tissues, GIANT also enables examination of changes in gene function across tissues on a broad scale. Users can compare a gene’s functional interaction in different tissues by selecting the relevant tissues in the dropdown menu.
NetWAS Analysis: GIANT can effectively reprioritize functional associations from a genome-wide association study (GWAS) and potentially identify additional disease-associated genes. The approach, named NetWAS, can be applied to any GWAS study, and does not require that the phenotype or disease have any known associated genes.

View Project

humanbase

HumanBase applies machine learning algorithms to learn biological associations from massive genomic data collections. These integrative analyses reach beyond existing “biological knowledge” represented in the literature to identify novel, data-driven associations.

View Project

IMP 2.0

Analyze your experimental results in the functional context of gene-gene networks from multiple organisms. Use IMP to direct additional functional experiments by identifying novel gene participants in a pathway or additional processes that a gene of interest participates in.

View Project

KNNimpute

KNNimpute is an implementation of the k-nearest neighbors algorithm for estimation of missing values in microarray data. In our comparative study of several different methods used for missing value estimation we determined that KNNimpute provides superior performance in a variety of situations.

View Project

Nano-Dissection

This server performs in silico nano-dissection, an approach we developed to identify genes with novel cell-lineage specific expression. This method leverages high-throughput functional genomics data from tissue homogenates to accurately predict genes enriched in specific cell types.

View Project

SEEK

SEEK is a computational gene co-expression search engine. SEEK provides biologists with a way to navigate the massive human expression compendium that now contains thousands of expression datasets. SEEK returns a robust ranking of co-expressed genes in the biological area of interest defined by the user’s query genes. In the meantime, it also prioritizes thousands of expression datasets according to the user’s query of interest. The unique strengths of SEEK include its support for multi-gene query and cross-platform analysis, as well as its rich visualization features.

View Project

Sleipnir

Sleipnir is a C++ library enabling efficient analysis, integration, mining, and machine learning over genomic data. This includes a particular focus on microarrays, since they make up the bulk of available data for many organisms, but Sleipnir can also integrate a wide variety of other data types, from pairwise physical interactions to sequence similarity or shared transcription factor binding sites. All analysis is done with attention to speed and memory usage, enabling the integration of hundreds of datasets covering tens of thousands of genes. In addition to the core library, Sleipnir comes with a variety of pre-made tools, providing solutions to common data processing tasks and examples to help you use Sleipnir in your own programs. Sleipnir is free, open source, fully documented, and ready to be used by itself or as a component in your computational biology analyses.

View Project

URSA(HD)

This is now the home of URSA and URSA(HD). Leveraging gene expression profiles of thousands of tissue and disease samples, URSA and URSA(HD) identify distinct molecular signatures of individual tissues and diseases. Submit your gene expression profile to use these molecular signatures and ascertain the tissue and disease signal in your data.

View Project

MuNDy

MuNDy is a C++ infrastructure for building scalable, biologically grounded microscale multibody dynamics software. It supports models with evolving mechanical and relational structure: heterogeneous rigid and flexible bodies; constraints, motors, and contacts; growth, division, death, and bonds that form, break, and reorganize; and long-range interactions mediated through a shared medium. Rather than a monolithic simulator, MuNDy builds upon Trilinos/STK’s runtime-extensible entity/part/field data model to provide reusable abstractions and data structures for this problem class. It is designed for research software developers building domain-specific applications across deployment scales, from laptops and workstations to multi-GPU clusters.

View Project

SkellySim

SkellySim is a simulation package for simulating cellular components such as flexible filaments, motor proteins, and arbitrary rigid bodies. It’s designed to be highly scalable, capable of both OpenMP and MPI style parallelism, while using the efficient STKFMM/PVFMM libraries for hydrodynamic resolution.

View Project