698 Publications

Endothelial Cell Diversity Revealed by Global Expression Profiling

J-T. Chi, H. Chang, G. Haraldsen, F. Jahnsen, O. Troyanskaya, D. Chang, Z. Wang, S. Rockson, M. van de Rijn, D. Botstein, P. Brown

The vascular system is locally specialized to accommodate widely varying blood flow and pressure and the distinct needs of individual tissues. The endothelial cells (ECs) that line the lumens of blood and lymphatic vessels play an integral role in the regional specialization of vascular structure and physiology. However, our understanding of EC diversity is limited. To explore EC specialization on a global scale, we used DNA microarrays to determine the expression profile of 53 cultured ECs. We found that ECs from different blood vessels and microvascular ECs from different tissues have distinct and characteristic gene expression profiles. Pervasive differences in gene expression patterns distinguish the ECs of large vessels from microvascular ECs. We identified groups of genes characteristic of arterial and venous endothelium. Hey2, the human homologue of the zebrafish gene gridlock, was selectively expressed in arterial ECs and induced the expression of several arterial-specific genes. Several genes critical in the establishment of left/right asymmetry were expressed preferentially in venous ECs, suggesting coordination between vascular differentiation and body plan development. Tissue-specific expression patterns in different tissue microvascular ECs suggest they are distinct differentiated cell types that play roles in the local physiology of their respective organs and tissues.

Show Abstract

A Bayesian Framework for Combining Heterogeneous Data Sources for Gene Function Prediction (in Saccharomyces Cerevisiae)

O. Troyanskaya, K. Dolinski, A. Owen, R. Altman, D. Botstein

Genomic sequencing is no longer a novelty, but gene function annotation remains a key challenge in modern biology. A variety of functional genomics experimental techniques are available, from classic methods such as affinity precipitation to advanced high-throughput techniques such as gene expression microarrays. In the future, more disparate methods will be developed, further increasing the need for integrated computational analysis of data generated by these studies. We address this problem with MAGIC (Multisource Association of Genes by Integration of Clusters), a general framework that uses formal Bayesian reasoning to integrate heterogeneous types of high-throughput biological data (such as large-scale two-hybrid screens and multiple microarray analyses) for accurate gene function prediction. The system formally incorporates expert knowledge about relative accuracies of data sources to combine them within a normative framework. MAGIC provides a belief level with its output that allows the user to vary the stringency of predictions. We applied MAGIC to Saccharomyces cerevisiae genetic and physical interactions, microarray, and transcription factor binding sites data and assessed the biological relevance of gene groupings using Gene Ontology annotations produced by the Saccaromyces Genome Database. We found that by creating functional groupings based on heterogeneous data types, MAGIC improved accuracy of the groupings compared with microarray analysis alone. We describe several of the biological gene groupings identified.

Show Abstract

Variation in Gene Expression Patterns in Follicular Lymphoma and the Response to Rituximab

S. Bohen, O. Troyanskaya, O. Alter, R. Warnke, D. Botstein, P. Brown, R. Levy

Analysis of the patterns of gene expression in follicular lymphomas from 24 patients suggested that two groups of tumors might be distinguished. All patients, whose biopsies were obtained before any treatment, were treated with rituximab, a monoclonal antibody directed against the B cell antigen, CD20. Gene expression patterns in the tumors that subsequently failed to respond to rituximab appeared more similar to those of normal lymphoid tissues than to gene expression patterns of tumors from rituximab responders. These findings suggest the possibility that the response of follicular lymphoma to rituximab treatment may be predicted from the gene expression pattern of tumors.

Follicular non-Hodgkin's lymphoma (NHL) is an indolent B cell malignancy with an annual incidence exceeding 10,000 cases in the United States. Although follicular lymphoma (FL) is frequently responsive to treatment, therapy is very rarely, if ever, curative. Rituximab, a chimeric IgG1 monoclonal antibody directed at the B cell antigen CD20, has become a mainstay of treatment for low-grade NHL; >400,000 patients worldwide have been treated with rituximab. Phase II trials of rituximab in patients with refractory or relapsed low grade or follicular NHL demonstrated a 50% response rate (1).

Despite this extensive clinical experience, the mechanism of action of rituximab remains unclear, as does the nature of resistance (2). Among the proposed mechanisms are antibody-dependent cell-mediated cytotoxicity (3), complement-mediated cytotoxicity (4), and direct cytotoxicity through modulating CD20 function (5–7). The association with resistance to rituximab treatment of a low-affinity variant of the Fc receptor (8) is suggestive of an immune mechanism, and remains the only plausible hint about the nature of resistance.

In this study, we examined whether gene expression profiling using cDNA microarrays could reveal biological diversity among follicular lymphomas and, more specifically, whether gene expression patterns in tumors might predict sensitivity to rituximab treatment.

Show Abstract

hospholipase A2 Group IIA Expression in Gastric Adenocarcinoma Is Associated with Prolonged Survival and Less Frequent Metastasis

S. Leung, X. Chen , K. Chu K, S. Yuen , J. Mathy, J. Ji , A. Chan, R. Li , S. Law, O. Troyanskaya, I. Tu, J. Wong, S. So , D. Botstein , P. Brown

We analyzed gene expression patterns in human gastric cancers by using cDNA microarrays representing approximately equal 30,300 genes. Expression of PLA2G2A, a gene previously implicated as a modifier of the Apc(Min/+) (multiple intestinal neoplasia 1) mutant phenotype in the mouse, was significantly correlated with patient survival. We confirmed this observation in an independent set of patient samples by using quantitative RT-PCR. Beyond its potential diagnostic and prognostic significance, this result suggests the intriguing possibility that the activity of PLA2G2A may suppress progression or metastasis of human gastric cancer.

Show Abstract

Nonparametric Methods for Identifying Differentially Expressed Genes in Microarray Data

O. Troyanskaya, M. Garber, P. Brown, R. Altman

MOTIVATION:
Gene expression experiments provide a fast and systematic way to identify disease markers relevant to clinical care. In this study, we address the problem of robust identification of differentially expressed genes from microarray data. Differentially expressed genes, or discriminator genes, are genes with significantly different expression in two user-defined groups of microarray experiments. We compare three model-free approaches: (1). nonparametric t-test, (2). Wilcoxon (or Mann-Whitney) rank sum test, and (3). a heuristic method based on high Pearson correlation to a perfectly differentiating gene ('ideal discriminator method'). We systematically assess the performance of each method based on simulated and biological data under varying noise levels and p-value cutoffs.

RESULTS:
All methods exhibit very low false positive rates and identify a large fraction of the differentially expressed genes in simulated data sets with noise level similar to that of actual data. Overall, the rank sum test appears most conservative, which may be advantageous when the computationally identified genes need to be tested biologically. However, if a more inclusive list of markers is desired, a higher p-value cutoff or the nonparametric t-test may be appropriate. When applied to data from lung tumor and lymphoma data sets, the methods identify biologically relevant differentially expressed genes that allow clear separation of groups in question. Thus the methods described and evaluated here provide a convenient and robust way to identify differentially expressed genes for further biological and clinical analysis.

Show Abstract

Sequence Complexity Profiles of Prokaryotic Genomic Sequences: A Fast Algorithm for Calculating Linguistic Complexity

O. Troyanskaya, O. Arbell, Y. Koren, G. Landau, A. Bolshoy

MOTIVATION:
One of the major features of genomic DNA sequences, distinguishing them from texts in most spoken or artificial languages, is their high repetitiveness. Variation in the repetitiveness of genomic texts reflects the presence and density of different biologically important messages. Thus, deviation from an expected number of repeats in both directions indicates a possible presence of a biological signal. Linguistic complexity corresponds to repetitiveness of a genomic text, and potential regulatory sites may be discovered through construction of typical patterns of complexity distribution.
RESULTS:
We developed software for fast calculation of linguistic sequence complexity of DNA sequences. Our program utilizes suffix trees to compute the number of subwords present in genomic sequences, thereby allowing calculation of linguistic complexity in time linear in genome size. The measure of linguistic complexity was applied to the complete genome of Haemophilus influenzae. Maps of complexity along the entire genome were obtained using sliding windows of 40, 100, and 2000 nucleotides. This approach provided an efficient way to detect simple sequence repeats in this genome. In addition, local profiles of complexity distribution around the starts of translation were constructed for 21 complete prokaryotic genomes. We hypothesize that complexity profiles correspond to evolutionary relationships between organisms. We found principal differences in profiles of the GC-rich and other (non-GC-rich) genomes. We also found characteristic differences in profiles of AT genomes, which probably reflect individual species variations in translational regulation.
AVAILABILITY:
The program is available upon request from Alexander Bolshoy or at http://csweb.haifa.ac.il/library/#complex.

Show Abstract
May 1, 2002

Diversity of Gene Expression in Adenocarcinoma of the Lung

M. Garber, O. Troyanskaya, K. Schluens, S. Petersen, Z. Thaesler , M. Pacyna-Gengelbach , M. van de Rijn, G. Rosen , C. Perou , R. Whyte , R. Altman , P. Brown , D. Botstein , I. Petersen

The global gene expression profiles for 67 human lung tumors representing 56 patients were examined by using 24,000-element cDNA microarrays. Subdivision of the tumors based on gene expression patterns faithfully recapitulated morphological classification of the tumors into squamous, large cell, small cell, and adenocarcinoma. The gene expression patterns made possible the subclassification of adenocarcinoma into subgroups that correlated with the degree of tumor differentiation as well as patient survival. Gene expression analysis thus promises to extend and refine standard pathologic analysis.

Show Abstract

Missing Value Estimation Methods for DNA Microarrays

O. Troyanskaya, M. Cantor, G. Sherlock, P. Brown, T. Hastie, R. Tibshirani, D. Botstein, R. Altman

MOTIVATION:
Gene expression microarray experiments can generate data sets with multiple missing expression values. Unfortunately, many algorithms for gene expression analysis require a complete matrix of gene array values as input. For example, methods such as hierarchical clustering and K-means clustering are not robust to missing data, and may lose effectiveness even with a few missing values. Methods for imputing missing data are needed, therefore, to minimize the effect of incomplete data sets on analyses, and to increase the range of data sets to which these algorithms can be applied. In this report, we investigate automated methods for estimating missing data.

RESULTS:
We present a comparative study of several methods for the estimation of missing values in gene microarray data. We implemented and evaluated three methods: a Singular Value Decomposition (SVD) based method (SVDimpute), weighted K-nearest neighbors (KNNimpute), and row average. We evaluated the methods using a variety of parameter settings and over different real data sets, and assessed the robustness of the imputation methods to the amount of missing data over the range of 1--20% missing values. We show that KNNimpute appears to provide a more robust and sensitive method for missing value estimation than SVDimpute, and both SVDimpute and KNNimpute surpass the commonly used row average method (as well as filling missing values with zeros). We report results of the comparative experiments and provide recommendations and tools for accurate estimation of missing microarray data under a variety of conditions.

Show Abstract
  • Previous Page
  • Viewing
  • Next Page
Advancing Research in Basic Science and MathematicsSubscribe to Flatiron Institute announcements and other foundation updates