Overview

Our laboratory's research focuses on how we can best utilize high-throughput data sources to understand biology at multiple levels. This problem has become increasingly challenging over the past decade as new experimental techniques and resources have become widely available and more affordable (e.g. gene expression microarrays, deep sequencing, tandem mass spectrometry, etc.). While these data promise to shed light on cellular mechanisms, gene regulation, protein functions and ultimately human disease, the rate at which these data are translated into knowledge is currently much slower than the rate of data generation. In order to help bridge this gap, our focus is on developing novel algorithms and approaches for the analysis, exploration and visualization of this data. In particular, these methods incorporate biologists into the early phases of analysis in order to utilize their existing expert knowledge.

Scientific report

Computationally Uncovering Protein Roles

While we now know the sequence of many organisms (including mice and humans), we still know relatively little about the roles played by the genes and proteins encoded within these sequences. Researchers have generated a great deal of data that can potentially shed light on the functions and processes performed by proteins, but these datasets are generally noisy, heterogeneous, and very large. Our group develops and applies machine learning and data mining techniques to these data that overcome these challenges in order to form highly confident predictions of protein roles. We then take the predictions back to the lab bench with our collaborators to confirm their validity.

Search Algorithms and Data Organization

Among our efforts in data mining, our group develops similarity search algorithms in order to investigate biological hypotheses and create community resources of data that are easily searchable. With the huge expansion of data generation that has occurred over the past few years, it has become impossible for researchers to understand, or even examine, all of the data publicly available. Our group aims to organize all of these available data and provide intuitive and useful interfaces in order for researchers to find the information they need.

Gene Expression Analysis

Gene expression microarray technology has been responsible for much of the functional genomics data generated in recent years. These data promise to help researchers investigate the regulation and transcription of genes, which is vital for understanding the ultimate roles that proteins play within cells as well as developing diagnostic tests and finding new drug targets. Microarray data can be particularly difficult to analyze and comprehend due to unusual noise characteristics and variation between protocols and technologies. We are developing methods to harness large collections of microarray data that make it more accessible to researchers. Also, our approaches are readily adaptable to new technologies that can measure transcription, such as deep sequencing approaches.

Large-Scale Data Visualization

One of the best ways for researchers to understand their data is to visually look for patterns within that data. However, the scale of genome-wide datasets prevents traditional methods and devices from fully displaying these data. We are developing techniques that utilize large-scale display devices as well as traditional displays in order to show researchers the information that they need to extract from their data. Further, we incorporate statistical measures directly into visualization schemes that improve their effectiveness and accuracy.

Lab staff

Co-Op Associate: Joseph Bane
Research Administrative Assistant:
Christina Gagliardi

Publication listings

(2005-present)

Hess DC, Myers CL, Huttenhower C, Hibbs MA, Hayes AP, Paw J, Clore JJ, Mendoza RM, Luis BS, Nislow C, Giaever G, Costanzo M, Troyanskaya OG, Caudy AA. 2009. Computationally driven, quantitative experiments discover genes required for mitochondrial biogenesis. PLoS Genet 5(3):e1000407. PMC2648979

Hibbs MA. 2009. The Effects of Pre-processing and Parameter Choices on Searches Through Large Gene Expression Data Collections. IEEE Int Conf on Genomic Signal Processing and Statistics (GENSiPs).

Hibbs MA, Myers CL, Huttenhower C, Hess DC, Li K, Caudy AA, Troyanskaya OG. 2009. Directing experimental biology: a case study in mitochondrial biogenesis. PLoS Comput Biol 5(3):e1000322. PMC2654405

Huttenhower C, Haley EM, Hibbs MA, Dumeaux V, Barrett DR, Coller HA, Troyanskaya OG. 2009. Exploring the human genome with functional maps. Genome Res 19(6):1093-1106. PMC2694471

Huttenhower C, Hibbs MA, Myers CL, Caudy AA, Hess DC, Troyanskaya OG. 2009. The impact of incomplete knowledge on evaluation: an experimental benchmark for protein function prediction. Bioinformatics Epub ahead of print.

Haarer B, Viggiano S, Hibbs MA, Troyanskaya OG, Amberg DC. 2007. Modeling complex genetic interactions in a simple eukaryotic genome: actin displays a rich spectrum of complex haploinsufficiencies. Genes Dev 21(2):148-159. PMC1770898

Hibbs MA, Hess DC, Myers CL, Huttenhower C, Li K, Troyanskaya OG. 2007. Exploring the functional landscape of gene expression: directed search of large microarray compendia. Bioinformatics 23(20):2692-2699.

Hibbs MA, Wallace G, Dunham M, Li K, Troyanskaya OG. 2007. Viewing the Larger Context of Genomic Data through Horizontal Integration. Proceedings of IEEE-CS 11th Int. Conf. on Information Visualization (IV®07) 326-334.

Huttenhower C, Flamholz AI, Landis JN, Sahi S, Myers CL, Olszewski KL, Hibbs MA, Siemers NO, Troyanskaya OG, Coller HA. 2007. Nearest Neighbor Networks: clustering expression data based on gene neighborhoods. BMC Bioinformatics 8:250. PMC1941745

Wallace G, Hibbs MA, Dunham M, Sealfon RSG, Troyanskaya OG, Li K. 2007. Scalable, Dynamic Analysis and Visualization for Genomic Datasets. Proceedings of IPDPS 2007 Workshop on Next Generation Software.

Huttenhower C, Hibbs MA, Myers CL, Troyanskaya OG. 2006. A scalable method for integration and functional analysis of multiple microarray datasets. Bioinformatics 22(23):2890-2897.

Myers CL, Barrett DR, Hibbs MA, Huttenhower C, Troyanskaya OG. 2006. Finding function: evaluation methods for functional genomic data. BMC Genomics 7:187. PMC1560386

Sealfon RS, Hibbs MA, Huttenhower C, Myers CL, Troyanskaya OG. 2006. GOLEM: an interactive graph-based gene-ontology navigation and analysis tool. BMC Bioinformatics 7:443. PMC1618863

Hibbs MA, Dirksen NC, Li K, Troyanskaya OG. 2005. Visualization methods for statistical analysis of microarray clusters. BMC Bioinformatics 6:115. PMC1156867

Li K, Hibbs MA, Wallace G, Troyanskaya OG. 2005. Dynamic Scalable Visualization for Collaborative Scientific Applications. Proceedings of IPDPS 2005 Workshop on Next Generation Software.

Myers CL, Robson D, Wible A, Hibbs MA, Chiriac C, Theesfeld CL, Dolinski K, Troyanskaya OG. 2005. Discovery of biological networks from diverse functional genomic data. Genome Biol 6(13):R114. PMC1414113

Wallace G, Anshus OJ, Bi P, Chen H, Chen Y, Clark D, Cook P, Finkelstein A, Funkhouser T, Gupta A, Hibbs M, Li K, Liu Z, Samanta R, Sukthankar R, Troyanskaya O. 2005. Tools and applications for large-scale display walls. IEEE Comput Graph Appl 25(4):24-33.

Search Staff Bibliography Database

Related Topics

Postdoctoral training program
Get details on our award winning program and how to apply.

Learn more