Overview

Our focus is the Gene Expression Database (GXD), which captures and integrates mouse expression data generated by biomedical researchers worldwide, with particular emphasis on mouse development. Gene expression data can provide researchers with critical insights into the function of genes and the molecular mechanisms of development, differentiation and disease. By combining different types of expression data and adding new data on a daily basis, GXD provides increasingly complete information about expression profiles of transcripts and proteins in wild-type and mutant mice. We work closely with the other Mouse Genome Informatics (MGI) projects to provide the community with integrated access to genotypic, expression and phenotypic data. In addition, we are the Data Coordination Center for the NIH Knockout Mouse Project (KOMP). Together with similar efforts in Europe (EUCOMM) and Canada (NorCOMM), KOMP aims to inactivate (knock out), one at a time, every gene in the mouse genome. The resulting mutant stem cell lines or mice will be a significant resource for the research community.

Research details

 

The Mouse Gene Expression Database (GXD), the Data Coordination Center for the NIH Knockout Mouse Project (KOMP), and ontology projects for the cancer Biomedical Informatics Grid (caBIG)


The Gene Expression Database (GXD)

The Gene Expression Database captures and integrates mouse expression data generated by biomedical researchers worldwide. Data are acquired from the literature and via electronic data submissions from conventional laboratories and large-scale data providers, converted into standardized formats, and made freely and widely accessible to complex biological queries. A particular emphasis is on RNA in situ, immunohistochemistry, Northern blot, Western blot, RT-PCR, RNAse protection, nuclease S1, and cDNA source data that report on endogenous gene expression during mouse development. By combining different types of expression data, and by adding new data on a daily basis, GXD provides increasingly complete information about expression profiles of transcripts and proteins in wild-type and mutant mice. Expression patterns from assays with differing spatial resolution are recorded in a consistent and integrated manner by using the hierarchically structured Anatomical Dictionary of Mouse Development built by our collaborators from the Edinburgh 3D Atlas and Gene Expression (EMAGE) project. Database records are linked to the primary image data. GXD currently holds about 45,000 images of original expression data. These images have been carefully indexed with respect to the genes analyzed, the probes used, the strain and genotype of the specimen, the developmental stages and anatomical structures in which expression was reported to be present or absent, and other parameters. One can, therefore, search for the expression data and images in many different ways. GXD places the gene expression data in the larger biological context by interconnecting with many other data and resources. We are part of the Gene Ontology Consortium, which classifies genes and their products with regard to biological processes, molecular functions, and cellular components (see report by Dr. Judith A. Blake in this volume). Further, we work closely with the other Mouse Genome Informatics (MGI) projects (see reports by Drs. Janan T. Eppig, Judith A. Blake, and Carol J. Bult in this volume) to provide the community with integrated access to genotypic, sequence, expression, and phenotypic data, and we supply links to many external resources such as PubMed, Online Mendelian Inheritance in Man (OMIM), National Center for Biotechnology Information (NCBI) EntrezGene, sequence databases, and databases from other species. GXD is available through the MGI web site.

In 2006, we also released a new version of the Gene Expression Notebook (GEN). The GEN was developed to facilitate electronic submission of expression data from conventional laboratories. It can be used as a laboratory notebook for storing expression data and for electronic submission of expression data to GXD. Implemented in Microsoft Excel™, it is easy to use and customize by biologists. The GEN is freely available.

The Data Coordination Center (DCC) for the Knockout Mouse Project (KOMP)

Over the next five years, the NIH Knockout Mouse Project (KOMP) will focus on generating null-mutant ES cells and mice for every gene in the mouse genome for which there are no null mutants publicly available. Work is being coordinated with similar projects in Europe (EUCOMM) and Canada (NorCOMM). The DCC will serve the KOMP research network laboratories as a central information resource regarding publicly available null and conditional mutants and provide query and display tools to support prioritizing new mouse genes for knockout experiments. It will collect information generated by the KOMP, track progress of the knockout mutant production pipelines, and make the data readily available to all members of the KOMP research network to support, coordinate, and synergize their individual research programs. In addition, the DCC will serve as the central public interface for the KOMP, with links to all groups funded by the KOMP research network, as well as link to other efforts generating knockout mice. It will provide web-based query and display tools for KOMP data and mechanisms to download large data sets for further analysis. It will also export data to other relevant community databases such as Ensembl, the UCSC Genome Browser, NCBI, and Mouse Genome Informatics (MGI). The KOMP project started in the fall of 2006. Resources and information developed by the DCC since then are available through the KOMP-DCC home page.

The Mouse / Human Anatomy Project

As part of the cancer Biomedical Informatics Grid (caBIG) effort, we have completed our work on the Mouse/Human Anatomy Project. In collaboration with NCI, we performed an in-depth comparison of the Adult Mouse Anatomical Dictionary built by the GXD project and the Human Anatomical Dictionary developed as part of the NCI Thesaurus. We extended and harmonized both anatomical ontologies and established extensive mappings between them. As these ontologies are being used to describe and integrate multiple types of data that relate to anatomy for the respective species, such as gene expression and phenotype data, our harmonization and mapping work will enable closer integration of basic science research and clinical data pertinent to cancer and other human diseases.

Evaluation of vocabulary review criteria

Vocabularies and ontologies are crucial for data representation and integration in the caBIG. However, while a number of criteria that every caBIG "standard" vocabulary should satisfy had been drafted, these criteria had not yet been put to the test. In fact, it was not clear how a large vocabulary should or could be reviewed. The objective of this project was, therefore, to develop recommendations for a scalable and implementable review process for vocabularies proposed for use within the caBIG. In order to accomplish this, we developed, based on the initial draft, a set of well-defined vocabulary review criteria. We then evaluated a large controlled vocabulary-the Gene Ontology (GO)-using these criteria, and thereby assessed the feasibility of their use for vocabulary review. Based on this work, we revised and refined the vocabulary review criteria and developed a recommendation for a scalable, operational vocabulary review process.

Lab staff

Principal Investigator: Martin Ringwald, Ph.D.
Co-Principal Investigators: Joel E. Richardson, Ph.D., Janan T. Eppig, Ph.D., Carol Bult, Ph.D., James A. Kadin, Ph.D.
Senior Scientific Curator: Constance M. Smith, Ph.D.
Scientific Curators: Jacqueline Finger, Ph.D., Terry Hayamizu, M.D., Ph.D., Ingeborg McCright, Ph.D.
Scientific Software Engineers: Peter Frost, B.S., Jeremy C. Mason, M.S.
Systems Administrator: Mike McCrossin, M.S.
Technical Writer: Diane J. Dahmen, M.A.
Administrative Assistants: Annie McDonnell, Janice E. Ormsby

Publication listings

Eppig JT, Blake JA, Bult CJ, Richardson JE, Kadin JA, Ringwald M, and the MGI Staff.  2007.  Mouse Genome Informatics(MGI) resources for pathology and toxicology.  Toxicol Pathol 35:456-7.

The FANTOM Consortium and RIKEN Genome Exploration Research Group and Genome Science Group. 2005. The Transcriptional Landscape of the Mammalian Genome. Science 309:1559-1563.

Hayamizu TF, Mangan M, Corradi JP and Ringwald M. 2005. The Adult Mouse Anatomical Dictionary: a tool for annotating and integrating data. Genome Biology 6:R29.

Harris MA, Clark J, Ireland A, Lomax J, Ashburner M, Foulger R, et al. 2004 The Gene Ontolody (GO) database and informatics resource. Nucleic Acids res 32:D268-261.

Ball C, Brazma A, Causton H, Chervitz S, Edgar R, Hingamp P, Matese JC, Parkinson H, Quackenbush J, Ringwald M, Sansone SA, Sherlock G, Spellman P, Stoeckert C, Tateno Y, Taylor R, White J, Winegarden N. 2004. Standards for microarray data: an open letter. Environ Health Perspect 112(12):A666-7.

The Gene Ontology Consortium. 2004. The Gene Ontology (GO) Database and Informatics Resource. Nucleic Acids Research 32:D258-D261.

Ball CA, Brazma A, Causton H, Chervitz S, Edgar R, Hingamp P, Matese JC, Parkinson H, Quackenbush J, Ringwald M, Sansone SA, Sherlock G, Spellman P, Stoeckert C, Tateno Y, Taylor R, White J, Winegarden N. 2004. Submission of microarray data to public repositories. PLoS Biol 2(9):E317.

Richardson JE, Kadin JA, Blake JA, Bult CJ, Eppig JT, Ringwald M, and the Mouse Genome Informatics Group. 2004. From sipping on a straw to drinking from a fire hose; data integration in a public genome database. Proceedings of the 20th IEEE International Conference on Data Engineering, Boston MA, March 2004, pp 795-798.

Hill DP, Begley DA, Finger JH, Hayamizu TF, McCright IJ, Smith CM, Beal JS, Corbani LE, Blake JA, Eppig JT, Kadin JA, Richardson JF and Ringwald M. 2004. The Mouse Genome Database(GXD):updates and enhancements. Nucleic Acids Res. 32:D568-D571.

Balderelli RM, Hill DP, Blake JA, Adachi J, Furano M, Bradt D, Corbani LE, Cousins S, Frazer KS, Qi D, Yang L, Ramachandran S, Reed D, Zhu Y, Kasukawa T, Ringwald M King BL, Maltais LJ, McKenzie LM, Schriml LM, Maglott D, Church DM, Pruitt K, Eppig JT, Richardson JE, Kadin JA, Bult CJ. 2003. Connecting sequence and biology in the laboratory mouse. Genome Res. 13:1505-19. 

Begley DA, Ringwald M. 2002. Electronic tools to manage gene expression data. Trends Genet 18:108-110.

Hill DP, Blake JA, Richardson JE, Ringwald M. 2002. Extension and Integration of the Gene Ontology (GO): Combining GO vocabularies with external vocabularies. Genome Res 12:1982-1991.

The FANTOM Consortium and the RIKEN Genome Exploration Research Group Phase I and 2 Team. 2002. Analysis of the mouse transcriptome based on functional annotation of 60,770 full-length cDNAs. Nature 420:563-573.

Books, Book Chapters, and Reviews:

Ringwald M. 2002. The Mouse Gene Expression Database (GXD). In: Analyzing Gene Expression, Lorkowski S, Cullen P, [eds], Wiley-VCH.

Search Staff Bibliography Database