Despite its power and versatility, the Rosetta software suite struggles to take full advantage of modern computing hardware. Masala is a set of C++ libraries under development in the Biomolecular Design Group that are intended to replace much of the functionality of Rosetta, and to harness modern massively parallel computing hardware (CPUs, GPUs or FPGAs), or current and near-future generation quantum computers (quantum annealers or gate-based systems) for NP-hard optimization problems. Masala’s versatile plugin system will also permit third parties to develop specialized molecular modelling protocols. Intended for use either as a support library for other software (such as Rosetta) or as a standalone set of applications, we plan to release Masala as free and open-source software in the near future.
For over twenty years, the Rosetta software suite has been one of the foremost software packages for modeling, docking, designing, and predicting structures of proteins built from the 20 canonical amino acid building-blocks. Our work has expanded the palette of building-blocks that can be accurately modelled in Rosetta and has allowed us to produce mixed-chirality peptide macrocycles targeting a number of proteins of medicinal relevance.
A major limitation of conventional macromolecular modelling approaches is the use of force fields to compute energies. Although force fields are fast, they are highly approximate, and must be tuned and calibrated for a particular set of chemical building-blocks. Quantum chemistry methods offer far greater accuracy and generality. Long thought to be too slow for macromolecular modelling applications, recent work in semi-empirical methods and in fragment molecular orbital methods has brought quantum chemistry calculations performed on large macromolecules into the realm of computational tractability. RosettaQM is a project started in the Biomolecular Design Group that permits Rosetta to carry out quantum mechanical calculations in the context of any design, docking or structure prediction protocol. Development is ongoing.
This is the simulation tool for tracking assemblies of microtubules driven by motor proteins. Microtubules are modeled as straight rigid fibers and motor proteins are modeled as Hookean-springs with active binding-unbinding dynamics governed by a two-stage kinetic Monte-Carlo model to impose detailed balance. A state-of-the-art collision resolution algorithm based on geometrically constrained optimization is incorporated to reach long physical dynamics timescales. This package is massively parallel.
This is a numerical computation package for various single- and double-layer kernels for Laplace and Stokes operators in boundary integral methods, implemented on top of the highly-optimized kernel independent fast-multipole method package PVFMM. All kernel evaluations are hand-optimized by AVX2 SIMD instructions to maximize the speed of evaluations.
DeepSEA is a deep learning-based algorithmic framework for predicting the chromatin effects of sequence alterations with single nucleotide sensitivity. DeepSEA can accurately predict the epigenetic state of a sequence, including transcription factors binding, DNase I sensitivities and histone marks in multiple cell types, and further utilize this capability to predict the chromatin effects of sequence variants and prioritize regulatory variants.
Tissue-specific Interactions: FNTM leverages a tissue-specific gold standard to automatically up-weight datasets relevant to a tissue from a large data compendium of diverse tissues and cell-types. The resulting functional networks accurately capture tissue-specific functional interactions.
Multi-tissue Analysis: Beyond questions pertaining to the role of single genes in single tissues, FNTM also enables examination of changes in gene function across tissues on a broad scale. Users can compare a gene’s functional interaction in different tissues by selecting the relevant tissues in the dropdown menu.
Tissue-specific Interactions: GIANT leverages a tissue-specific gold standard to automatically up-weight datasets relevant to a tissue from a large data compendium of diverse tissues and cell-types. The resulting functional networks accurately capture tissue-specific functional interactions.
Multi-tissue Analysis: Beyond questions pertaining to the role of single genes in single tissues, GIANT also enables examination of changes in gene function across tissues on a broad scale. Users can compare a gene’s functional interaction in different tissues by selecting the relevant tissues in the dropdown menu.
NetWAS Analysis: GIANT can effectively reprioritize functional associations from a genome-wide association study (GWAS) and potentially identify additional disease-associated genes. The approach, named NetWAS, can be applied to any GWAS study, and does not require that the phenotype or disease have any known associated genes.
KNNimpute is an implementation of the k-nearest neighbors algorithm for estimation of missing values in microarray data. In our comparative study of several different methods used for missing value estimation we determined that KNNimpute provides superior performance in a variety of situations.
This server performs in silico nano-dissection, an approach we developed to identify genes with novel cell-lineage specific expression. This method leverages high-throughput functional genomics data from tissue homogenates to accurately predict genes enriched in specific cell types.
SEEK is a computational gene co-expression search engine. SEEK provides biologists with a way to navigate the massive human expression compendium that now contains thousands of expression datasets. SEEK returns a robust ranking of co-expressed genes in the biological area of interest defined by the user’s query genes. In the meantime, it also prioritizes thousands of expression datasets according to the user’s query of interest. The unique strengths of SEEK include its support for multi-gene query and cross-platform analysis, as well as its rich visualization features.
Sleipnir is a C++ library enabling efficient analysis, integration, mining, and machine learning over genomic data. This includes a particular focus on microarrays, since they make up the bulk of available data for many organisms, but Sleipnir can also integrate a wide variety of other data types, from pairwise physical interactions to sequence similarity or shared transcription factor binding sites. All analysis is done with attention to speed and memory usage, enabling the integration of hundreds of datasets covering tens of thousands of genes. In addition to the core library, Sleipnir comes with a variety of pre-made tools, providing solutions to common data processing tasks and examples to help you use Sleipnir in your own programs. Sleipnir is free, open source, fully documented, and ready to be used by itself or as a component in your computational biology analyses.
This is now the home of URSA and URSA(HD). Leveraging gene expression profiles of thousands of tissue and disease samples, URSA and URSA(HD) identify distinct molecular signatures of individual tissues and diseases. Submit your gene expression profile to use these molecular signatures and ascertain the tissue and disease signal in your data.
SkellySim is a simulation package for simulating cellular components such as flexible filaments, motor proteins, and arbitrary rigid bodies. It’s designed to be highly scalable, capable of both OpenMP and MPI style parallelism, while using the efficient STKFMM/PVFMM libraries for hydrodynamic resolution.