Overview
Our focus is the Gene Expression Database (GXD), which captures and integrates mouse expression data generated by biomedical researchers worldwide, with particular emphasis on mouse development. Gene expression data can provide researchers with critical insights into the function of genes and the molecular mechanisms of development, differentiation and disease. By combining different types of expression data and adding new data on a daily basis, GXD provides increasingly complete information about expression profiles of transcripts and proteins in wild-type and mutant mice. We work closely with the other Mouse Genome Informatics (MGI) projects to provide the community with integrated access to genotypic, expression and phenotypic data. In addition, we are the Data Coordination Center for the NIH Knockout Mouse Project (KOMP). Together with similar efforts in Europe (EUCOMM) and Canada (NorCOMM), KOMP aims to inactivate (knock out), one at a time, every gene in the mouse genome. The resulting mutant stem cell lines or mice will be a significant resource for the research community.
Scientific report
Bioinformatics resources for the study of development, differentiation and disease
The Gene Expression Database for Mouse Development
The Gene Expression Database (GXD) captures and integrates mouse gene expression data generated by biomedical researchers worldwide. Data are acquired from the literature and via electronic data submissions from conventional laboratories and large-scale data providers, converted into standardized formats, and made freely and widely accessible to complex biological queries. A particular emphasis is on RNA in situ, immunohistochemistry, Northern blot, Western blot, RT-PCR, RNAse protection, nuclease S1 and cDNA source data that report on endogenous gene expression during mouse development. By combining different types of expression data and adding new data on a daily basis, GXD provides increasingly complete information about the expression profiles of transcripts and proteins in wild-type and mutant mice.
Expression patterns are recorded in a consistent and integrated manner by using the hierarchically structured Anatomical Dictionary of Mouse Development built by our collaborators from the Edinburgh 3D Atlas and Gene Expression (EMAGE) project. Database records are linked to the primary image data. GXD currently holds over 67,000 images of original expression data. These images have been carefully indexed with respect to the genes analyzed, the probes used, the strain and genotype of the specimen, the developmental stages and anatomical structures in which expression was reported to be present or absent, and other parameters. One can, therefore, search for the expression data and images in many different ways.
GXD places the gene expression data in the larger biological context by interconnecting with many other data and resources. We work closely with the other Mouse Genome Informatics (MGI) projects (see also Janan T. Eppig, Ph. D., Judith A. Blake, Ph. D., and Carol J. Bult, Ph.D.) to provide the community with integrated access to genotypic, sequence, expression, functional, and phenotypic data. We also supply links to many external resources, such as the Allen Brain Atlas, GENSAT, EMAGE, GenePaint, PubMed, Online Mendelian Inheritance in Man (OMIM), National Center for Biotechnology Information (NCBI) EntrezGene and GEO, sequence databases, InterPro, and databases from other species. GXD is available through the MGI web site (http://www.informatics.jax.org/) or directly at http://www.informatics.jax.org/menus/expression_menu.shtml.
The Data Coordination Center for the Knockout Mouse Project (KOMP)
Together with similar projects in Europe (EUCOMM) and Canada (NorCOMM), the NIH Knockout Mouse Project (KOMP) aims to generate mutant ES cells (conditional and null mutants) for every protein-coding gene in the mouse genome. The Data Coordination Center (DCC) serves the KOMP research network laboratories as a central information resource regarding publicly available null and conditional mutants and provides query and display tools to support prioritizing new mouse genes for knockout experiments. It collects information generated by the KOMP and tracks progress of the knockout mutant production pipelines. The data are made readily available to all members of the KOMP research network to support, coordinate and synergize their individual research programs.
In addition, the DCC serves as the central public interface for the KOMP, with links to all groups funded by the KOMP research network, as well as links to other efforts generating knockout mice. The DCC provides web-based query and display tools for KOMP data. Users can easily find out if a gene is targeted by KOMP, determine its status in the knockout production pipelines, look up the molecular details of the mutation, determine what products (targeting vectors, ES cells, mice) are already available, and proceed with ordering products from the KOMP Repository. In addition, the DCC web interface allows researchers to nominate genes for targeting by KOMP. The DCC also exports data to other relevant community databases such as Ensembl, the UCSC Genome Browser, NCBI and MGI. The KOMP-DCC website is available at http://www.knockoutmouse.org/.
Data coordination and a common web portal for the International Knockout Mouse Consortium
The members of the International Knockout Mouse Consortium (IKMC) are working together to mutate all protein-coding genes in the mouse using a combination of gene trapping and gene targeting in C57BL/6 mouse embryonic stem (ES) cells. The IKMC includes the Knockout Mouse Project (KOMP), the European Conditional Mouse Mutagenesis Program (EUCOMM), the North American Conditional Mouse Mutagenesis Project (NorCOMM), and the high-throughput gene trapping effort by the Texas A&M Institute for Genomic Medicine (TIGM). In addition to our work on the KOMP-DCC project, we are participating in the International Data Coordination Center (I-DCC project), recently funded by the European Union. Both projects have joined forces to develop informatics tools that enable the effective coordination and prioritization of work, and to establish a common web portal for information on targeting vectors, ES cells and mouse knockouts created by all IKMC projects.
Evaluation of Vocabulary Review Criteria
Vocabularies and ontologies are crucial for data representation and integration in the cancer Biomedical Informatics grid (caBIG). The objective of this project was to develop recommendations for a scalable and implementable review process for vocabularies proposed for use within the caBIG. In previous work, we had developed an initial set of well-defined vocabulary review criteria and then evaluated a large controlled vocabulary, the Gene Ontology (GO), using these criteria, thereby assessing the feasibility of their use for vocabulary review. Together with Drs. James Cimino (National Institutes of Health), Olivier Bodenreider (National Library of Medicine), Grace Stafford (The Jackson Laboratory) and other caBIG participants, the vocabulary review criteria were tested and refined further by evaluating three additional large standard terminologies. The joint effort resulted in unified and final recommendations for a vocabulary review process that are now being used by the caBIG.
Lab staff
Principal Investigator: Martin Ringwald, Ph.D.
Co-Principal Investigators: Joel E. Richardson, Ph.D., Janan T. Eppig, Ph.D., Carol Bult, Ph.D., James A. Kadin, Ph.D.
Senior Scientific Curator: Constance M. Smith, Ph.D.
Scientific Curators: Jacqueline Finger, Ph.D., Terry Hayamizu, M.D., Ph.D., Ingeborg McCright, Ph.D., Hamsa Tadepally, M.S.
Scientific Software Engineers: Peter J. Frost, B.S., Jeremy C. Mason, M.S.
Systems Administrator: Mike McCrossin, M.S.
Technical Writer: Diane J. Dahmen, M.A.
Administrative Assistants: Annie McDonnell, Janice E. Ormsby
Publication listings
Eppig JT, Blake JA, Bult CJ, Richardson JE, Kadin JA, Ringwald M, and the MGI Staff. 2007. Mouse Genome Informatics(MGI) resources for pathology and toxicology. Toxicol Pathol 35:456-7.
The FANTOM Consortium and RIKEN Genome Exploration Research Group and Genome Science Group. 2005. The Transcriptional Landscape of the Mammalian Genome. Science 309:1559-1563.
Hayamizu TF, Mangan M, Corradi JP and Ringwald M. 2005. The Adult Mouse Anatomical Dictionary: a tool for annotating and integrating data. Genome Biology 6:R29.
Harris MA, Clark J, Ireland A, Lomax J, Ashburner M, Foulger R, et al. 2004 The Gene Ontolody (GO) database and informatics resource. Nucleic Acids res 32:D268-261.
Ball C, Brazma A, Causton H, Chervitz S, Edgar R, Hingamp P, Matese JC, Parkinson H, Quackenbush J, Ringwald M, Sansone SA, Sherlock G, Spellman P, Stoeckert C, Tateno Y, Taylor R, White J, Winegarden N. 2004. Standards for microarray data: an open letter. Environ Health Perspect 112(12):A666-7.
The Gene Ontology Consortium. 2004. The Gene Ontology (GO) Database and Informatics Resource. Nucleic Acids Research 32:D258-D261.
Ball CA, Brazma A, Causton H, Chervitz S, Edgar R, Hingamp P, Matese JC, Parkinson H, Quackenbush J, Ringwald M, Sansone SA, Sherlock G, Spellman P, Stoeckert C, Tateno Y, Taylor R, White J, Winegarden N. 2004. Submission of microarray data to public repositories. PLoS Biol 2(9):E317.
Richardson JE, Kadin JA, Blake JA, Bult CJ, Eppig JT, Ringwald M, and the Mouse Genome Informatics Group. 2004. From sipping on a straw to drinking from a fire hose; data integration in a public genome database. Proceedings of the 20th IEEE International Conference on Data Engineering, Boston MA, March 2004, pp 795-798.
Hill DP, Begley DA, Finger JH, Hayamizu TF, McCright IJ, Smith CM, Beal JS, Corbani LE, Blake JA, Eppig JT, Kadin JA, Richardson JF and Ringwald M. 2004. The Mouse Genome Database(GXD):updates and enhancements. Nucleic Acids Res. 32:D568-D571.
Balderelli RM, Hill DP, Blake JA, Adachi J, Furano M, Bradt D, Corbani LE, Cousins S, Frazer KS, Qi D, Yang L, Ramachandran S, Reed D, Zhu Y, Kasukawa T, Ringwald M King BL, Maltais LJ, McKenzie LM, Schriml LM, Maglott D, Church DM, Pruitt K, Eppig JT, Richardson JE, Kadin JA, Bult CJ. 2003. Connecting sequence and biology in the laboratory mouse. Genome Res. 13:1505-19.
Begley DA, Ringwald M. 2002. Electronic tools to manage gene expression data. Trends Genet 18:108-110.
Hill DP, Blake JA, Richardson JE, Ringwald M. 2002. Extension and Integration of the Gene Ontology (GO): Combining GO vocabularies with external vocabularies. Genome Res 12:1982-1991.
The FANTOM Consortium and the RIKEN Genome Exploration Research Group Phase I and 2 Team. 2002. Analysis of the mouse transcriptome based on functional annotation of 60,770 full-length cDNAs. Nature 420:563-573.
Books, Book Chapters, and Reviews:
Ringwald M. 2002. The Mouse Gene Expression Database (GXD). In: Analyzing Gene Expression, Lorkowski S, Cullen P, [eds], Wiley-VCH.