The mouse is a key model organism for the understanding of mammalian biology because it has been well studied and is genetically and physiologically similar to humans. To utilize mouse data to its fullest, we have developed an integrated database of mouse genetic, genomic and biological data. The Mouse Genome Informatics Database (MGI) is used by the international scientific community as its primary resource for mouse information and as a tool for new biological discovery. The database contains a wide variety of data pertaining to genes, their DNA and protein sequences, and the phenotypes that result from mutations in different genes. The three central components of MGI are the Mouse Genome Database (MGD), an internationally recognized database for the laboratory mouse, the Mouse Tumor Biology (MTB) database, which facilitates the selection of experimental models for cancer research, and the International Mouse Strain Resource (IMSR), a searchable online database cataloging mouse stocks available worldwide. The database continues to expand to keep abreast of new technologies and to grow with our expanding knowledge of how the genetic blueprint of DNA manifests in traits of a living individual.
Bioinformatics Resources for the Mouse: From Sequence to Models of Human Disease
The mouse is an exceptional mammalian model system for connecting knowledge from sequence to human disease. The mouse has the unique combination of a well-developed genetic map, a sequenced genome, a large collection of inbred and genetically engineered strains, and extensive technologies to manipulate its genome. The Mouse Genome Informatics Database (MGI) integrates genetic, genomic, biological, and phenotypic data about the laboratory mouse to support the understanding of normal development and the mechanisms underlying heritable diseases.
Many diverse types of data, varying in granularity, representation, and organization come together in MGI. This data is integrated through resolution of object identities and application of structured vocabularies (ontologies) for anatomy, gene function, phenotypes, tumor and disease terms; and by adherence to international nomenclature standards. MGI data integration is critical for posing complex questions and enabling retrieval of complete results. Integration encourages data exploration, provides connections between biological data concepts, and can lead to new insights and knowledge discovery. Three components of MGI are outlined below.
Mouse Genome Database (MGD)
MGD (www.informatics.jax.org) is the internationally recognized database for the laboratory mouse. Among the many types of data represented are genes and genomic features; nucleotide and protein sequences; molecular reagents; genomic, genetic, and cytogenetic maps; gene variants, particularly SNPs; orthology relationships between mouse genes and genes in other mammals; gene functions (GO; Gene Ontology); phenotypic descriptions of mouse mutants and inbred strains; and mouse genotypes that model human hereditary diseases. Data in MGD are updated continuously through semi-automated loads, curation of the primary scientific literature, and data submissions from the scientific community. Data exchange and co-curation through collaborations with major resources such as NCBI and UniProt ensure that all of these resources contain the highest quality data. MGD users benefit from links to key information resources such as ENSEMBL, UCSC, InterPro, MGC, OMIM, and others, enabled through shared data element identities. Our web interface allows simple and complex querying and flexible navigation among data pages and graphical displays. The back-end database and supporting software and front-end presentations continue to be improved and expanded to reflect community needs, new research technologies, and the growth and change in the character and types of data being generated.
Mouse Tumor Biology Database (MTB)
MTB (tumor.informatics.jax.org) facilitates the selection of experimental models for cancer research, allows cancer models to be evaluated as they apply to humans, presents mutation patterns for specific mouse cancers, and allows comparison across cancer models. MTB includes data on the organ and tissue of tumor origin, tumor classification, induction treatment, metastasis, mouse strains, and specific genes or mutations in the host strain or somatic changes in the tumor.
MTB incorporates data from many laboratories that collect, image, and diagnose mouse tumors. The Mouse Pathology Submission interface allows contributors to create records; generate tumor diagnoses including pathology and treatment descriptions; attach images to diagnoses; and edit and review their submitted data.
The Mouse Tumor Frequency Grid is an interactive visualization tool providing an overview of tumor frequencies and tumor types in different mouse strains (http://tumor.informatics.jax.org/mtbwi/tumorFrequencyGrid.do). Strains are displayed on the vertical axis and organs on the horizontal axis. Color-coded cells in the Grid represent tumor frequencies. Each colored cell is an active link, generating a database query for the underlying data. Users can expand the Grid axes for individual strain families and organ systems, thus "zooming-in" on specific sub-strains and tissues of interest.
International Mouse Strain Resource (IMSR)
IMSR (www.imsr.org) is a searchable online database cataloging mouse stocks available worldwide, including inbred, mutant, and genetically engineered mice and mutant ES cell lines. The goal of the IMSR is to assist the international scientific community in locating and obtaining mouse resources for research. For each strain listed in the IMSR, users can access information about where the strain is available from (repository holder), in what state the strain is available (live, frozen embryo, frozen germplasm, ES cell line), link to descriptive information about the strain, link to data in MGD describing phenotypes of any mutations carried by the strain, and link to a form for contacting the strain holder to order the strain or ask for additional information. The reciprocal links between MGD phenotype records and IMSR strain holder information allow users to find mice based on gene or strain name or phenotype. IMSR currently includes data from 15 repository consortia representing 26 repository sites worldwide.
Accessing the Mouse Genome Informatics (MGI) Resource:
www.informatics.jax.org provides access to all MGI resources.
Web users can use MGI's browsing and searching tools and visual displays to navigate the site. Each web page includes a quick Search Box where a word or partial word can be entered and the type of data to be searched can be specified. For example, a user could enter "pax" into the Search Box and indicate that "Gene symbols/names" are to be searched. Various query forms allow users to ask more complex questions that involve multiple parameters and multiple types of data. For example, using the Phenotypes/Allele query form, one could ask "What mutant alleles located on Chromosome 11 between 40 and 100 Mb were created by gene targeting and are models for cardiomyopathy?"
Computational users can access MGI data in nightly-generated tab-delimited data files containing key data elements downloadable for analyses. On request, we also provide SQL (structured query language) accounts for directly querying the database.
MGI-LIST, an electronic bulletin board for the mouse genomics community has over 2,000 subscribers. To subscribe, visit the website. User Support is available by phone (207-288-6445), fax (207-288-6132), or email to assist with using MGI data resources.
Read about other MGI collaborating projects in the reports by Drs. Blake, Bult, and Ringwald.
Software Project Manager:
Scientific Software Engineers:
Blake JA, Bult CJ, Eppig JA, Kadin JA, Richardson JE, The Mouse Genome Database Group. The Mouse Genome Database: integration of and acccess to knowledge about the laboratory mouse. Nucleic Acids Res. epub ahead of print.
Kohler S, Doelken SC, Mungall CJ, Bauer S, Firth HV, Bailleul-Forestier I, Black GC, Brown DL, Brudno M, Campbell J, Fritzpatrick DR, Eppig JT et all. The Human Phenotype Ontology project: linking molecular biology and disease through phenotype data. Nucleic Acids Res. epub ahead of print.
Smith CM, Finger JH, Hayamizu TF, McCright IJ, Xu J, Berghout J, Campbell J, Corbani LE, Forthofer KL, Frost PJ, Miers D, Shaw DR, Stone KR, Eppig JT, Kadin JA, Richardson JE, Ringwald M. The mouse Gene Expression Database (GXD): 2014 update. Nucleic Acids Res. epub ahead of print.
Bult CJ, Eppig JT, Blake JA, Kadin JA, Richardson JE; the Mouse Genome Database Group. 2013. The Mouse Genome Database: Genotypes, Phenotypes, and Models of Human Disease. Nucleic Acids Res. 41(database issue):D885-91.
Heffner CS, Pratt CH, Babiuk RP, Sharma Y, Rockwood SF, Donahue LR, Eppig JT, Murray SA. 2012. Supporting conditional mouse mutagenesis with a comprehensive cre characterization resource. Nat Commun. 2012:3:1218. PMCID: PMC3514490
Murray SA, Eppig JT, Smedley D, Simpson EM, Rosenthal N. 2012. Erratum to: Beyond knockouts: cre resources for conditional mutagenesis. Mamm Genome 23(9-10):587-99.
Bello SM, Richardson JE, Davis AP, Weigers TC, Mattingly CJ, Dolan ME, Smith CL, Blake JA, Eppig JT. 2012. Disease model curation improvements at Mouse Genome Informatics. Database(Oxford) Mar 20;2012bar063 PMCID: PMC3308153
Eppig JT, Blake JA, Bult CJ, Kadin JA, Richardson JE; the Mouse Genome Database Group. 2011. The Mouse Genome Database (MGD): comprehensive resource for genetics and genomics of the laboratory mouse. Nucleic Acids Res 40(D1): D881- D886. PMCID: PMC3245042
Ringwald M, Iyer V, Mason JC, Stone KR, Tadepally HD, Kadin JA, Bult CJ, Eppig JT, Oakley DJ, Briois S, Stupka E, Maselli V, Smedley D, Liu S, Hansen J, Baldock R, Hicks GG, Skarnes WC. 2011. The IKMC web portal: a central point of entry to data and resources from the International Knockout Mouse Consortium. Nucleic Acids Res. 39 (suppl 1): D849-D855. PMCID: PMC3013768
Gaudet P, Bairoch A, Field D, Sansone S-A, Taylor C, Atwood TK, Bateman A, Blake JA, Bult CJ, Cherry JM, Chisholm RL, Cochrane G, Cook CE, Eppig JT, Galperin MY, Gentleman R, Goble CA, Gojobori T, Hancock JM, Howe DG, Imanishi T, Kelso J, Landsman D, Lewis SE, Mizrachi IK, Orchard S, Ouellette BFF, Ranganathan S, Richarson L, Rocca-Serra P, Scholfield PN, Smedley D, Southan C, Tan TW, Tatusova T, Whetzel PL, White O, Yamasaki C on behalf of the BioDBCore working group. 2011. Towards BioDBcore: a community-defined information specification for biological databases. Nucleic Acids Res 39(suppl 1): D7-D10. PMCID: PMC3017395
Finger JH, Smith CM, Hayamizu TF, McCright IJ, Eppig JT, KadinJA, Richardson JE, Ringwald M. 2011. The Mouse Gene Expression Database (GXD): 2011 update. Nucleic Acids Res. 39(suppl 1): D835 - D841.PMCID: PMC3013713
Blake JA, Bult CJ, Kadin JA, Richardson JE, Eppig JT and the Mouse Genome Database Group. 2011. The Mouse Genome Database (MGD): premier model organism resource for mammalian genomics and genetics. Nucleic Acids Res 39(suppl 1): D842-D848 PMCID: PMC3013640
Ringwald M, Eppig JT. 2011. Mouse mutants and phenotypes: accessing information for the study of mammalian gene function. Methods. 53(4):405-10. PMCID: PMC3062719.
Smedley D, Schofield P, Chen C-K, Aidinis V, Ainali C, Bard J, Balling R, Birney E, Blake A, Bongcam-Rudloff, Brookes AJ, Cesareni G, Chandras C, Eppig J, Flicek P, Gkoutos G, Greenway S, Gruenberger M, Heriche J-K, Lyall A, Mallon A-M, Muddyman D, Reisinger F, Ringwald M, Rosenthal N, Schughart K, Swertz M, Thorisson GA, Zouberakis M, Hancock JM. 2010. Finding and sharing; new approaches to registries of resources and services for the biomedical sciences. Database(Oxford). PMCID:PMC 2911849
Schofield PN, Eppig J, Huala E, deAngelis MH, Harvey M, Davidson M, Weaver T, Brown S, Smedley D, Rosenthal N, Schughart K, Aidinis V, Tocchini-Valentini G, Hancock JM. 2010. Research funding. Sustaining the data and bioresource commons. Science 330(6004): 592-593. PMC (NA)
International Arabidopsis Information Consortium(Eppig JT). 2010. An international bioinformatics infrastructure to underpin the Arabidopsis community. Plant Cell 22(8): 2530-2536. PMCID: PMC2947164
Bult CJ, Kadin JA, Richardson JE, Blake JA, Eppig JT; Mouse Genome Database Group. 2010. The Mouse Genome Database: enhancements and updates. Nucleic Acids 38(DI):D586-92. PMCID: PMC2808942
Gene Ontology Consortium. The Gene Ontology in 2010: extension and refinements. Nucleic Acids Res 38(DI):D331-5. PMCID: PMC2808930
Smith CL and Eppig JT. 2009. The Mammalian Phenotype Ontology: enabling robust annotation and comparative analysis. Wiley Interdisciplinary Revies: Systems Biology and Medicine1(3):390-399. PMCID: PMC2801442
Blake JA, Bult CJ, Eppig JT, Kadin JA, Richardson JE; Mouse Genome Database Group. 2009. The Mouse Genome Database Genotype::phenotype. Nucleic Acids Res 37(DI):D712-9. PMCID: PMC 2686566
Krupke DM, Begley DA, Sundberg JP, Bult CJ, Eppig JT. 2008. The Mouse Tumor Biology Database. Nature Review Cancer 8(6):456-65. PMCID: PMC2574871
Bult CJ, Eppig JT, Kadin JA, Richarson JE, Blake JA; Mouse Genome Database Group. 2008. The Mouse Genome Database (MGD): mouse biology and model systems. Nucleic Acids Res 36(DI):D724-8. PMCID: PMC2238849
Anagnostopoulos AV, Blake JA, Bult CJ, Ringwald M, Richardson JE, Kadin JA, eppig JT. 2008. Using Bio-Ontologies as Data as Data Annotation, Integration and Analytical Tools at the Mouse Genome Information Resource. IEEE(Oct 2008).
Smith CM, Finger JH, Hayamizu TF, McCright IJ, Eppig JT, Kadin JA, Richardson JE, Ringwald M. 2007. The Mouse Gene Expression Database (GXD): 2007 update. Nucleic Acids Res 35(DI)D618-23.
Mouse Phenome Database Integration Consortium. 2007. Integration of mouse phenome data resources. Mamm Genome 18:157-163.
Eppig JT, Blake JA, Bult CJ, Richardson JE, Kadin JA, Ringwald M and the MGI Staff. 2007. Mouse Genome Informatics (MGI) resources for pathology and toxicology. Toxicol Pathol 35:456-7.
Barker JE, Deveau SA, Compton ST, Fancher K, Eppig JT. 2005. High incidence, early onset of histiocytic sarcomas in mice with Hertwig's anemia. Exp Hematol. 33:1118-1129.
Eppig JT, Bult CJ, Kadin JA, Richardson JE, Blake JA and the Mouse Genome Database Group. 2005. The Mouse Genome Database (MGD): From Genes to Mice, A community Resource for Mouse Biology. Nucleic Acids Research 33:D471-5.
Krupke DM, Naf D, Vincent MJ, Allio T, Mikaelian I, Sundberg JP, Bult CJ, Eppig JT. 2005. The mouse tumor biology database: integrated access to mouse cancer biology data. Exp Lung Res 31:259-270.
Smith CL, Goldsmith CA, Eppig JT. 2005. The Mammalian Phenotype Ontology as a tool for annotation, analyzing, and comparing phenotypic information. Genome Biology 6(1):R7.
Strivens M, Eppig JT. 2004. Visualizing the laboratory mouse: capturing phenotypic information. Genetica 122:89-97.
Richardson JE, Kadin JA, Blake JA, Bult CJ, Eppig JT, Ringwald M and the Mouse Genome Informatics Group. 2004. From sipping on a straw to drinking from a fire hose; data integration in a public genome database. Proceedings of the 20th IEEE International Conference on Data Engineering (March) 795-798.
Hill DP, Begley DA, Finger JH, Hayamizu TF, McCrigh IJ, Simth CM, Beal JS, Corbani LE, Blake JA, Eppig JT, Kadin JA, Richardson JE, Ringwald M. 2004. The Mouse Gene Expression Database (GXD): updates and enhancements. Nucl Acids Res 32:D568-D571.
Mikaelian I, Nanney LB, Parman KS, Kusewitt DF, Wart JM, Naf D, Krupke DM, Eppig JT, Bult CJ, Seymour R, Ichiki T and Sundberg JP. 2004. Antibodies that Label Paraffin-Embedded Mouse Tissues: A Collaboratove Endeavor. Toxicol Pathol 32:181-191.
Gene Ontology Consortium. 2004. The Gene Ontology (GO) database and informatics resource. Nucl Acids Res 32:D258-D261.
Bult CJ, Blake JA, Richardson JE, Kadin JA, Eppig JT, and the members of the Mouse Genome Database Group. 2004. The Mouse Genome Database (MGD) Integrating biology with the genome. Nucl Acids Res 32:D476-D481.
Balderelli RM, Hill DP, Blake JA, Adachi J, Furuno M, Bradt D, Corbani LE, Cousins S, Frazer KS, Qi D, Yang L, Ramachandran S, Reed D, Zhu Y, Kasukawa T, Ringwald M, King BL, Maltais LJ, McKenzie LM, Schriml LM, Maglott D, Church DM, Pruitt K, Eppig JT, Richardson JE, Kadin JA, Bult CJ. 2003. Connecting sequence and biology in the laboratory mouse. Genome Res 13:1505-19.
Rowe LB, Barter ME, Kelmenson JA, Eppig JT. 2003. The compre henisve mouse radiationhybrid map densely cross-reference to the rcombination map: A tool to support the sequence assemblies. Genome Res 13:122-133.
Blake JA, Richardson JE, Bult CJ, Kadin JA, Eppig JT. 2003. MGD: The Mouse Genome Database. Jucleic Acids Res 31:193-195
Blake JA, Richardson JE, Bult CJ, Kadin JA, Eppig JT, and the Mouse Genome Database
Group. 2002. The Mouse Genome Database (MGD): The model organism database for the laboratory mouse. Nucleic Acids Res 30:113-115.
Eppig JT, Blake JA, Smith C, Burkart DL, Goldsmith CW, Lutz CM, Smith CL. 2002. Corralling Conditional Mutations: A unified resource for mouse phenotypes. Genesis 32:63-65.
Maltais LJ, Blake JA, Chu T, Lutz CM, Eppig JT, Jackson I. 2002. Rules and guidelines for mouse gene, allele, and mutation nomenclature: A condensed version. Genomics 79: 471-474.
Naf D, Krupke DM, Sundberg JP, Eppig JT, Bult CJ. 2002. The Mouse Tumor Biology
database (MTB): A public resource for cancer genetics and pathology of the mouse. Cancer Res 62:1235-1240.
Twigger S, Lu J, Shimoyama M, Chen D, Pasko D, Long H, Ginster J, Chen C-F, Nigam R, Kwitek-Black A, Eppig J, Maltais L, Maglott D, Schuler G, Jacob H, Tonellato P. 2002. Rat Genome Database (RGD) – mapping disease onto the genome. Nucleic Acids Res 30:125-128.
Books, Book Chapters, and Reviews:
Blake JA, Eppig JT, Bult CJ. 2003. Mouse and Rat Genome Informatics. In: Bioinformatics For Geneticists, Wiley & Sons, Inc., pp. 119-142