CCB: Publications

Gene Expression Patterns in Ovarian Carcinomas

M. Schaner, D. Ross, G. Ciaravino, T. Sørlie, O. Troyanskaya, M. Diehn, Y. Wang, G. Duran, T. Sikic, S. Caldeira, H. Skomedal, I-P. Tu, T. Hernandez-Boussard, S. Johnson, P. O'Dwyer, M. Fero, G. Kristensen, A-L. Børresen-Dale, T. Hastie, R. Tibshirani, M. van de Rijn, N. Teng, T. Longacre, D. Botstein, P. Brown, B. Sikic

We used DNA microarrays to characterize the global gene expression patterns in surface epithelial cancers of the ovary. We identified groups of genes that distinguished the clear cell subtype from other ovarian carcinomas, grade I and II from grade III serous papillary carcinomas, and ovarian from breast carcinomas. Six clear cell carcinomas were distinguished from 36 other ovarian carcinomas (predominantly serous papillary) based on their gene expression patterns. The differences may yield insights into the worse prognosis and therapeutic resistance associated with clear cell carcinomas. A comparison of the gene expression patterns in the ovarian cancers to published data of gene expression in breast cancers revealed a large number of differentially expressed genes. We identified a group of 62 genes that correctly classified all 125 breast and ovarian cancer specimens. Among the best discriminators more highly expressed in the ovarian carcinomas were PAX8 (paired box gene 8), mesothelin, and ephrin-B1 (EFNB1). Although estrogen receptor was expressed in both the ovarian and breast cancers, genes that are coregulated with the estrogen receptor in breast cancers, including GATA-3, LIV-1, and X-box binding protein 1, did not show a similar pattern of coexpression in the ovarian cancers.

Show Abstract

Systemic and Cell Type-Specific Gene Expression Patterns in Scleroderma Skin

M. Whitfield, D. Finlay, J. Isaac Murray, O. Troyanskaya, J-T. Chi, A. Pergamenschikov, T. McCalmont, P. Brown, D. Botstein, M. Kari Connolly

We used DNA microarrays representing >12,000 human genes to characterize gene expression patterns in skin biopsies from individuals with a diagnosis of systemic sclerosis with diffuse scleroderma. We found consistent differences in the patterns of gene expression between skin biopsies from individuals with scleroderma and those from normal, unaffected individuals. The biopsies from affected individuals showed nearly indistinguishable patterns of gene expression in clinically affected and clinically unaffected tissue, even though these were clearly distinguishable from the patterns found in similar tissue from unaffected individuals. Genes characteristically expressed in endothelial cells, B lymphocytes, and fibroblasts showed differential expression between scleroderma and normal biopsies. Analysis of lymphocyte populations in scleroderma skin biopsies by immunohistochemistry suggest the B lymphocyte signature observed on our arrays is from CD20+ B cells. These results provide evidence that scleroderma has systemic manifestations that affect multiple cell types and suggests genes that could be used as potential markers for the disease.

Show Abstract

Variation in Gene Expression Patterns in Human Gastric Cancers

X. Chen, S. Leung, S. Yuen, K-M. Chu, J. Ji, R. Li, A. Chan, S. Law, O. Troyanskaya, J. Wong, S. So, D. Botstein, P. Brown

Gastric cancer is the world's second most common cause of cancer death. We analyzed gene expression patterns in 90 primary gastric cancers, 14 metastatic gastric cancers, and 22 nonneoplastic gastric tissues, using cDNA microarrays representing ∼30,300 genes. Gastric cancers were distinguished from nonneoplastic gastric tissues by characteristic differences in their gene expression patterns. We found a diversity of gene expression patterns in gastric cancer, reflecting variation in intrinsic properties of tumor and normal cells and variation in the cellular composition of these complex tissues. We identified several genes whose expression levels were significantly correlated with patient survival. The variations in gene expression patterns among cancers in different patients suggest differences in pathogenetic pathways and potential therapeutic strategies.

Show Abstract

Endothelial Cell Diversity Revealed by Global Expression Profiling

J-T. Chi, H. Chang, G. Haraldsen, F. Jahnsen, O. Troyanskaya, D. Chang, Z. Wang, S. Rockson, M. van de Rijn, D. Botstein, P. Brown

The vascular system is locally specialized to accommodate widely varying blood flow and pressure and the distinct needs of individual tissues. The endothelial cells (ECs) that line the lumens of blood and lymphatic vessels play an integral role in the regional specialization of vascular structure and physiology. However, our understanding of EC diversity is limited. To explore EC specialization on a global scale, we used DNA microarrays to determine the expression profile of 53 cultured ECs. We found that ECs from different blood vessels and microvascular ECs from different tissues have distinct and characteristic gene expression profiles. Pervasive differences in gene expression patterns distinguish the ECs of large vessels from microvascular ECs. We identified groups of genes characteristic of arterial and venous endothelium. Hey2, the human homologue of the zebrafish gene gridlock, was selectively expressed in arterial ECs and induced the expression of several arterial-specific genes. Several genes critical in the establishment of left/right asymmetry were expressed preferentially in venous ECs, suggesting coordination between vascular differentiation and body plan development. Tissue-specific expression patterns in different tissue microvascular ECs suggest they are distinct differentiated cell types that play roles in the local physiology of their respective organs and tissues.

Show Abstract

A Bayesian Framework for Combining Heterogeneous Data Sources for Gene Function Prediction (in Saccharomyces Cerevisiae)

O. Troyanskaya, K. Dolinski, A. Owen, R. Altman, D. Botstein

Genomic sequencing is no longer a novelty, but gene function annotation remains a key challenge in modern biology. A variety of functional genomics experimental techniques are available, from classic methods such as affinity precipitation to advanced high-throughput techniques such as gene expression microarrays. In the future, more disparate methods will be developed, further increasing the need for integrated computational analysis of data generated by these studies. We address this problem with MAGIC (Multisource Association of Genes by Integration of Clusters), a general framework that uses formal Bayesian reasoning to integrate heterogeneous types of high-throughput biological data (such as large-scale two-hybrid screens and multiple microarray analyses) for accurate gene function prediction. The system formally incorporates expert knowledge about relative accuracies of data sources to combine them within a normative framework. MAGIC provides a belief level with its output that allows the user to vary the stringency of predictions. We applied MAGIC to Saccharomyces cerevisiae genetic and physical interactions, microarray, and transcription factor binding sites data and assessed the biological relevance of gene groupings using Gene Ontology annotations produced by the Saccaromyces Genome Database. We found that by creating functional groupings based on heterogeneous data types, MAGIC improved accuracy of the groupings compared with microarray analysis alone. We describe several of the biological gene groupings identified.

Show Abstract

Variation in Gene Expression Patterns in Follicular Lymphoma and the Response to Rituximab

S. Bohen, O. Troyanskaya, O. Alter, R. Warnke, D. Botstein, P. Brown, R. Levy

Analysis of the patterns of gene expression in follicular lymphomas from 24 patients suggested that two groups of tumors might be distinguished. All patients, whose biopsies were obtained before any treatment, were treated with rituximab, a monoclonal antibody directed against the B cell antigen, CD20. Gene expression patterns in the tumors that subsequently failed to respond to rituximab appeared more similar to those of normal lymphoid tissues than to gene expression patterns of tumors from rituximab responders. These findings suggest the possibility that the response of follicular lymphoma to rituximab treatment may be predicted from the gene expression pattern of tumors.

Follicular non-Hodgkin's lymphoma (NHL) is an indolent B cell malignancy with an annual incidence exceeding 10,000 cases in the United States. Although follicular lymphoma (FL) is frequently responsive to treatment, therapy is very rarely, if ever, curative. Rituximab, a chimeric IgG1 monoclonal antibody directed at the B cell antigen CD20, has become a mainstay of treatment for low-grade NHL; >400,000 patients worldwide have been treated with rituximab. Phase II trials of rituximab in patients with refractory or relapsed low grade or follicular NHL demonstrated a 50% response rate (1).

Despite this extensive clinical experience, the mechanism of action of rituximab remains unclear, as does the nature of resistance (2). Among the proposed mechanisms are antibody-dependent cell-mediated cytotoxicity (3), complement-mediated cytotoxicity (4), and direct cytotoxicity through modulating CD20 function (5–7). The association with resistance to rituximab treatment of a low-affinity variant of the Fc receptor (8) is suggestive of an immune mechanism, and remains the only plausible hint about the nature of resistance.

In this study, we examined whether gene expression profiling using cDNA microarrays could reveal biological diversity among follicular lymphomas and, more specifically, whether gene expression patterns in tumors might predict sensitivity to rituximab treatment.

Show Abstract

hospholipase A2 Group IIA Expression in Gastric Adenocarcinoma Is Associated with Prolonged Survival and Less Frequent Metastasis

S. Leung, X. Chen , K. Chu K, S. Yuen , J. Mathy, J. Ji , A. Chan, R. Li , S. Law, O. Troyanskaya, I. Tu, J. Wong, S. So , D. Botstein , P. Brown

We analyzed gene expression patterns in human gastric cancers by using cDNA microarrays representing approximately equal 30,300 genes. Expression of PLA2G2A, a gene previously implicated as a modifier of the Apc(Min/+) (multiple intestinal neoplasia 1) mutant phenotype in the mouse, was significantly correlated with patient survival. We confirmed this observation in an independent set of patient samples by using quantitative RT-PCR. Beyond its potential diagnostic and prognostic significance, this result suggests the intriguing possibility that the activity of PLA2G2A may suppress progression or metastasis of human gastric cancer.

Show Abstract

Nonparametric Methods for Identifying Differentially Expressed Genes in Microarray Data

O. Troyanskaya, M. Garber, P. Brown, R. Altman

MOTIVATION:
Gene expression experiments provide a fast and systematic way to identify disease markers relevant to clinical care. In this study, we address the problem of robust identification of differentially expressed genes from microarray data. Differentially expressed genes, or discriminator genes, are genes with significantly different expression in two user-defined groups of microarray experiments. We compare three model-free approaches: (1). nonparametric t-test, (2). Wilcoxon (or Mann-Whitney) rank sum test, and (3). a heuristic method based on high Pearson correlation to a perfectly differentiating gene ('ideal discriminator method'). We systematically assess the performance of each method based on simulated and biological data under varying noise levels and p-value cutoffs.

RESULTS:
All methods exhibit very low false positive rates and identify a large fraction of the differentially expressed genes in simulated data sets with noise level similar to that of actual data. Overall, the rank sum test appears most conservative, which may be advantageous when the computationally identified genes need to be tested biologically. However, if a more inclusive list of markers is desired, a higher p-value cutoff or the nonparametric t-test may be appropriate. When applied to data from lung tumor and lymphoma data sets, the methods identify biologically relevant differentially expressed genes that allow clear separation of groups in question. Thus the methods described and evaluated here provide a convenient and robust way to identify differentially expressed genes for further biological and clinical analysis.

Show Abstract

Sequence Complexity Profiles of Prokaryotic Genomic Sequences: A Fast Algorithm for Calculating Linguistic Complexity

O. Troyanskaya, O. Arbell, Y. Koren, G. Landau, A. Bolshoy

MOTIVATION:
One of the major features of genomic DNA sequences, distinguishing them from texts in most spoken or artificial languages, is their high repetitiveness. Variation in the repetitiveness of genomic texts reflects the presence and density of different biologically important messages. Thus, deviation from an expected number of repeats in both directions indicates a possible presence of a biological signal. Linguistic complexity corresponds to repetitiveness of a genomic text, and potential regulatory sites may be discovered through construction of typical patterns of complexity distribution.
RESULTS:
We developed software for fast calculation of linguistic sequence complexity of DNA sequences. Our program utilizes suffix trees to compute the number of subwords present in genomic sequences, thereby allowing calculation of linguistic complexity in time linear in genome size. The measure of linguistic complexity was applied to the complete genome of Haemophilus influenzae. Maps of complexity along the entire genome were obtained using sliding windows of 40, 100, and 2000 nucleotides. This approach provided an efficient way to detect simple sequence repeats in this genome. In addition, local profiles of complexity distribution around the starts of translation were constructed for 21 complete prokaryotic genomes. We hypothesize that complexity profiles correspond to evolutionary relationships between organisms. We found principal differences in profiles of the GC-rich and other (non-GC-rich) genomes. We also found characteristic differences in profiles of AT genomes, which probably reflect individual species variations in translational regulation.
AVAILABILITY:
The program is available upon request from Alexander Bolshoy or at http://csweb.haifa.ac.il/library/#complex.

Show Abstract

Diversity of Gene Expression in Adenocarcinoma of the Lung

M. Garber, O. Troyanskaya, K. Schluens, S. Petersen, Z. Thaesler , M. Pacyna-Gengelbach , M. van de Rijn, G. Rosen , C. Perou , R. Whyte , R. Altman , P. Brown , D. Botstein , I. Petersen

The global gene expression profiles for 67 human lung tumors representing 56 patients were examined by using 24,000-element cDNA microarrays. Subdivision of the tumors based on gene expression patterns faithfully recapitulated morphological classification of the tumors into squamous, large cell, small cell, and adenocarcinoma. The gene expression patterns made possible the subclassification of adenocarcinoma into subgroups that correlated with the degree of tumor differentiation as well as patient survival. Gene expression analysis thus promises to extend and refine standard pathologic analysis.

Show Abstract