Overview
The activities of my research group and collaborations focus on the development of bioinformatics systems essential for functional genomics, genetics and phenotypic research. The sequencing of mouse, human and other genomes and the rapid accumulation of very large data sets has resulted in an overwhelming amount of information from multiple sources containing a variety of content and formats. The challenge is to bring all the data together and make it easily accessible to researchers directly and/or for additional computer analysis. Our current research centers on combining bio-ontologies (defined, controlled, structured vocabularies) and database systems to identify molecular elements that contribute to the processes of particular diseases, such as lung cancer. This work is undertaken as part of the Gene Ontology Consortium, a group of 19 model organism databases and genome annotation centers. My group, as part of the Mouse Geneome Informatics Consortiun at The Jackson Laboratory, is responsible for the functional and comparative annotation of mouse genes.
Research details
Functional and Comparative Genome Informatics
My research focuses on functional and comparative genome informatics. I work on the development of systems to integrate and interrogate genetic, genomic and phenotypic information. I am one of the leaders of the Gene Ontology (GO) project and I have been deeply involved with the work of the GO Consortium since its inception. The Gene Ontology project is an international effort to provide controlled structured vocabularies for molecular biology that serve as terminologies, classifications and ontologies to further data integration, analysis and reasoning. My interest in bio-ontologies stems as well from the work I do as a principal investigator with the Mouse Genome Informatics (MGI) project at The Jackson Laboratory. The MGI system is a model organism community database resource that provides integrated information about the genetics, genomics and phenotypes of the laboratory mouse. My current research projects combine bio-ontologies and database knowledge systems to represent disease processes with the objective of discovering molecular elements that contribute to particular pathologies such as lung cancer.
The Gene Ontology Consortium
Widespread use of the GO system for functional annotation of genomes enables comparative analysis of genome-size data sets. Understanding and supporting the GO annotation process and bringing new groups into the GO community is essential to the continued development of a broad, integrated network of biological information that can be transparently shared to enable and advance knowledge discovery. The GO Consortium group now consists of 19 model organism databases and genome-annotation groups who work cooperatively to construct the GO bio-ontologies, to provide functional annotations for a wide variety of organisms, and to support a GO information resource. GO participants located at The Jackson Laboratory lead ontology development projects, develop new software applications for the GO project, and provide GO annotations for mouse gene products. Other core groups of the GO project include an ontology development group based at the European Bioinformatics Institute in the United Kingdom, a software and resource development group based at Lawrence Berkeley National Laboratory, and a production database group based at Stanford University. Notable accomplishments of the GO Consortium this year included:
- initiation of the 'reference genome' project to provide comprehensive functional annotations across the gene ortholog sets of the major model organisms for genes implicated in human diseases;
- revision of subtrees of the GO to update and extend the representation of early central nervous system development processes;
- development of new community functional annotation systems for genes involved in immunological processes;
- ontological improvements including " is a complete" updates for all domains; and
- publishing of an electronic GO newsletter to inform our community on GO developments.
The Mouse Genome Informatics Project
MGI supports scientific research that uses the laboratory mouse as a model for the study of human biology and disease. MGI data are curated both from the biomedical literature and from co-curated data loads from other major bioinformatics resources. My research group is responsible for the functional and comparative annotation of mouse genes in the MGI resource. This work includes defining the mouse gene set (in co-curation with other informatics resource providers), indexing the biomedical literature for functional annotation, providing official gene nomenclature along with a robust set of synonyms, and extending the representation of relationships between mouse, human and rat genes and genomes. We work closely with the MGI Sequences and Sequence Maps group (see report by Carol J. Bult) to resolve sequence-based inconsistencies in the representations of the mouse genome and the sequence and mapping data integrated in MGI and between MGI and other informatics resource centers such as the NCBI, Ensembl and the UniProt groups. We also work closely with the MGI Phenotypes group (see report by Janan Eppig) to support the development of standards for the representation of phenotype/genotype data in MGI.
Major projects this year included:
- incorporation of orthology sets for chimpanzee and dog genes and gene models to complement our existing focus on human and rat orthologs;
- collaborations with scientific community experts in the nomenclature revisions of 16 gene families, including tubulin, kallikrein, keratin, late cornified envelope, and vomeronasal 2 receptor; and
- revision of the literature review and curation process to reflect emergence of electronic publication resources.
Lab staff
|
Principal Investigator: Co-Principal Investigators and Subcontract Principal Investigators: |
Senior Scientific Curators: Scientific Curators: Bioinformatics Analysts: Scientific Curatorial Assistant: Information Specialist II: Research Administrative Assistant: Note: Dr. Blake’s program also funds staff at external institutions.
|
Publication listings
Some Recent Publications Include (2008-2003):
Bult CJ, Eppig JT, Kadin JA, Richardson JE, Blake JA; Mouse Genome Database Group . 2008. The Mouse Genome Database (MGD): mouse biology and model systems. Nucleic Acids Res. 36(Database):D724-8.
The Gene Ontology Consortium*. 2008. The Gene Ontology (GO) Project in 2008. Nucleic Acids Res. 36(Database issue): D440-4.
Mouse Phenotype Database Integration Consortium. 2007. Integration of mouse phenome data resources. Mamm Genome 18:157-163.
Eppig JT, Blake JA, Bult CJ, Richardson JE, Kadin JA, Ringwald M, and the MGI Staff. 2007. Mouse Genome Informatics(MGI) resources for pathology and toxicology. Toxicol Pathol 35:456-7.
Diehl AD, Lee JA, Scheuermann RH, Blake JA. 2007. Ontology development for biological systems: Immunology. Bioinformatics Jan 31:epub ahead of print.
Eppig JT, Blake JA, Bult CJ, Kadin JA, Richardson JE, and the Mouse Genome Informatics Group. 2007. The Mouse Genome Database (MGD): new features facility a model system. Nucleic Acids Research 35(Database Issue): D630-7.
Blake JA, Bult CJ. 2006. Beyond the data deluge: data integration and bio-ontologies. J Biomed Inform 39(3):314-320.
Blake JA, Eppig JT, Bult CJ, Kadin JA, Richardson JE and Mouse Genome Database Group (MGD). 2006. The Mouse Genome Database(MGD): updates and enhancements.. Nucleic Acids Research 34:D562-D567.
Blake JA and Bult CJ. 2006. Beyond the data deluge: data integration and bio-ontologies. J Biomen Inform 39:314-320.
Dolan ME, Camon E, Ni L, Blake JA. 2005. A Procedure for Assessing GO Annotation Consistency. Bioinformatics (Si)i136-i143.
Santos C, Blake J, States DJ. 2005. Supplementary data need to be kept in public repositories. Nature 438:738.
Eppig JT, Bult CJ, Kadin JA, Richardson JE, Blake JA, Anagnostopoulos A, Balderelli RM, Baya M, Beal JS et al. 2005.
The Mouse Genome Database (MGD): from genes to mice--- a community resource fro mouse biology. Nuclieic Acids Res 33:D471-5.
Drabkin HJ, Hollenbeck C, Hill DP, and Blake J. 2005. Ontological visualization of protein-protein interactions. BMC Bioinformatics 6:29.
Tuason O, Chen L, Liu H, Blake JA, Friedman C. 2004. Biological nomenclatures: Source of lexical knowledge and ambiguity. Proceedings of the Pacific Symposium of Biocomputing 9:238-249.
The Gene Ontology Consortium. 2004. The Gene Ontology (GO) Database and Informatics Resource. Nucleic Acids Research 32: D258-D261.
Blake J. 2004. Bio-ontologies fast and furious. Nat Biotechnol 22(6):773-4.
Evsilov A, de Vries WN, Peaston AE, Radford EE, Fancher KS, Chen FH, Blake JA, Bult CJ, Latham KE, Soltor D, Knowles BB. 2004. Systems biology of the 2- cell mouse embryo. Cytogenet Genome Res 105:240-250.
Hill DP, Begley DA, Finger JH, Hayamizu TF, McCright IJ, Smith CM, Beal JS, Corbani LE, Blake JA, Eppig JT, Kadin JA, Richardson JE, Ringwald M. 2004. The Mouse Gene Expression Database (GXD): Updates and enhancements. Nucleic Acids Research 32:D568-D571.
Bult CJ, Blake JA, Richardson JE, Kadin JA, Eppig JT, Mouse Genome Database Group. 2004. The Mouse Genome Database (MGD): Integrating biology with the genome. Nucleic Acids Research 32: D476-D481.
Bada M, Stevens R, Goble C, Gil Y, Ashburner M, Blake JA, Cherry JM, Harris M, Lewis S. 2004. A short study on the success of the gene ontology. Journal of Web Semantics 1(2).
Blake JA. Genomics and conservation biology. Special Publication, Conservation Genetics in the Age of Genomics Symposium. American Museum of Natural History, N.Y., in press.
Books, Book Chapters, and Reviews
Blake JA, Eppig JT, Bult CJ. 2003. Mouse and Rat Genome Informatics. In: Bioinformatics for Geneticists, Barnes MR, Gray IC, [eds], Wiley Press, London, U.K., pp. 119-142.
Blake JA, Harris M. 2003. The Gene Ontology Project: Structured vocabularies for molecular biology and their application to genome and expression analysis. In: Current Protocols in Bioinformatics. Baxevanis AD, Davison DB, Page R, Stormo G, Stein L, [eds], Wiley & Sons, Inc., New York, N.Y