Overview

My research focuses on functional and comparative genome informatics. I work on the development of systems to integrate and interrogate genetic, genomic and phenotypic information. I am one of the leaders of the Gene Ontology (GO) project and I have been deeply involved with the work of the GO Consortium since its inception. The Gene Ontology project is an international effort to provide controlled structured vocabularies for molecular biology that serve as terminologies, classifications and ontologies to further data integration, analysis and reasoning. My interest in bio-ontologies stems as well from the work I do as a principal investigator with the Mouse Genome Informatics (MGI) project at The Jackson Laboratory. The MGI system is a model organism community database resource that provides integrated information about the genetics, genomics and phenotypes of the laboratory mouse. My current research projects combine bio-ontologies and database knowledge systems to represent disease processes with the objective of discovering molecular elements that contribute to particular pathologies such as lung cancer.

Scientific report

Functional and Comparative Genome Informatics

My research focuses on functional and comparative genome informatics. I work on the development of systems to integrate and interrogate genetic, genomic and phenotypic information. I am one of the leaders of the Gene Ontology (GO) project and I have been deeply involved with the work of the GO Consortium since its inception. The Gene Ontology project is an international effort to provide controlled structured vocabularies for molecular biology that serve as terminologies, classifications and ontologies to further data integration, analysis and reasoning. My interest in bio-ontologies stems as well from the work I do as a principal investigator with the Mouse Genome Informatics (MGI) project at The Jackson Laboratory. The MGI system is a model organism community database resource that provides integrated information about the genetics, genomics and phenotypes of the laboratory mouse. My current research projects combine bio-ontologies and database knowledge systems to represent disease processes with the objective of discovering molecular elements that contribute to particular pathologies such as lung cancer.

The Gene Ontology Consortium

Widespread use of the GO system for functional annotation of genomes enables comparative analysis of genome-size data sets. Understanding and supporting the GO annotation process and bringing new groups into the GO community is essential to the continued development of a broad, integrated network of biological information that can be transparently shared to enable and advance knowledge discovery. The GO Consortium group now consists of 19 model organism databases and genome-annotation groups who work cooperatively to construct the GO bio-ontologies, to provide functional annotations for a wide variety of organisms, and to support a GO information resource. GO participants located at The Jackson Laboratory lead ontology development projects, develop new software applications for the GO project, and provide GO annotations for mouse gene products. Other core groups of the GO project include an ontology development group based at the European Bioinformatics Institute in the United Kingdom, a software and resource development group based at Lawrence Berkeley National Laboratory, and a production database group based at Stanford University. Notable accomplishments of the GO Consortium this year included:

  • initiation of the 'reference genome' project to provide comprehensive functional annotations across the gene ortholog sets of the major model organisms for genes implicated in human diseases;
  • revision of subtrees of the GO to update and extend the representation of early central nervous system development processes;
  • development of new community functional annotation systems for genes involved in immunological processes;
  • ontological improvements including " is a complete" updates for all domains; and
  • publishing of an electronic GO newsletter to inform our community on GO developments.
Our work with the Gene Ontology Consortium complements our local continuing efforts to enhance the representation of genes and gene products in the MGI system. We contribute an updated mouse GO annotation file to the GO database each week.

The Mouse Genome Informatics Project

MGI supports scientific research that uses the laboratory mouse as a model for the study of human biology and disease. MGI data are curated both from the biomedical literature and from co-curated data loads from other major bioinformatics resources. My research group is responsible for the functional and comparative annotation of mouse genes in the MGI resource. This work includes defining the mouse gene set (in co-curation with other informatics resource providers), indexing the biomedical literature for functional annotation, providing official gene nomenclature along with a robust set of synonyms, and extending the representation of relationships between mouse, human and rat genes and genomes. We work closely with the MGI Sequences and Sequence Maps group (see report by Carol J. Bult) to resolve sequence-based inconsistencies in the representations of the mouse genome and the sequence and mapping data integrated in MGI and between MGI and other informatics resource centers such as the NCBI, Ensembl and the UniProt groups. We also work closely with the MGI Phenotypes group (see report by Janan Eppig) to support the development of standards for the representation of phenotype/genotype data in MGI.

Major projects this year included:

  • incorporation of orthology sets for chimpanzee and dog genes and gene models to complement our existing focus on human and rat orthologs;
  • collaborations with scientific community experts in the nomenclature revisions of 16 gene families, including tubulin, kallikrein, keratin, late cornified envelope, and vomeronasal 2 receptor; and
  • revision of the literature review and curation process to reflect emergence of electronic publication resources.
MGI-GO Scientific Curators are using a combination of algorithmic and manual approaches to update annotations of mouse gene products to the GO vocabularies. Currently, more than 17,500 mouse genes have at least preliminary GO annotations and over 9,700 have annotations based on experimental assays in mouse. We use data-mining and other strategies to semi-automate gene annotation to the GO. The highest quality annotations, however, depend on skilled scientific curators who review published literature for information that provides experimental verification for the GO attributions.

Lab staff

Principal Investigator:
Judith A. Blake, Ph.D.

Co-Principal Investigators and Subcontract Principal Investigators:
Michael Apweiler, Ph.D., European Molecular Biological Laboratory
Rolf Apweiler, Ph.D., European Molecular Biological Laboratory
Carol J. Bult, The Jackson Laboratory
J. Michael Cherry, Ph.D., Stanford University
Janan T. Eppig, PhD., The Jackson Laboratory
James A. Kadin, Ph.D., The Jackson Laboratory
Suzanna Lewis, M.S., Lawrence Berkeley National Laboratory
Seung Rhee, Ph.D., Carnegie Institution of Washington Joel E. Richardson, Ph.D., The Jackson Laboratory
Paul W. Sternberg, Ph.D., California Institute of Technology
Simon Twigger, Ph.D., Medical College of Wisconsin

Senior Scientific Curators:
Alexander Diehl, Ph.D.
Lois Maltais, B.S.
Harold J. Drabkin,  Ph.D.

Scientific Curators:
Li Ni, Ph.D.
Beverly Richards-Smith Ph.D.
Dmitry Sitnikov, Ph.D.

Bioinformatics Scientist:
David P. Hill, Ph.D.

Bioinformatics Analysts:
Mary Dolan, Ph.D.
Kim Forthofer, M.S.

Scientific Curatorial Assistant:
Monica McAndrews-Hill, B.S.

Database User Support Specialist:
Susan McClatchy, M.S.

Information Specialist II:
Nancy E. Butler

Research Administrative Assistant:
Christina Gagliardi 

Note: Dr. Blake’s program also funds staff at external institutions.

Publication listings

(2005-present):

Arighi CN, Liu H, Natale DA, Barker WC, Drabkin H, Blake JA, Smith B, Wu CH. 2009. TGF-beta signaling proteins and the Protein Ontology. BMC Bioinformatics 10(Suppl 5):S3. PMC2679403

Blake JA, Bult CJ, Eppig JT, Kadin JA, Richardson JE; Mouse Genome Database Group. 2009. The Mouse Genome Database genotypes::phenotypes. Nucleic Acids Res 37(Database):D712-D719.

Dolan ME, Blake JA. 2009. Using ontology visualization to facilitate access to knowledge about human disease genes. Applied Ontology 4(1):35-49.

Sam LT, Mendonca EA, Li J, Blake J, Friedman C, Lussier YA. 2009. PhenoGO: an integrated resource for the multiscale mining of clinical and biological data. BMC Bioinformatics 10(Suppl 2):S8. PMC2646241

Diehl AD, Augustine AD, Blake JA, Cowell LG, Gold ES, Gondre-Lewis TA, Masci AM, Meehan TF, Morel PA, NIAID Cell Ontology Working Group, Nijnik A, Peters B, Pulendran B, Scheuermann RH, Zand MS, Mungall CJ. 200_. Hematopoietic Cells Types: Prototype for a Revised Cell Ontology. Proceedings of ICBO, (in press).

Joslyn C, Baddeley B, Blake J, Bult C, Dolan M, Riensche R, Rodland K, Sanfilippo A, White A. 200_. Automated Annotation-Based Bio-Ontology Alignment with Structural Validation. Proceedings of ICBO, (in press).

The Reference Genome Group of the Gene Ontology Consortium. 200_. The Gene Ontology's Reference Genome Project: A Unified Framework for Functional Annotation across Species. PLoS Computational Biology, (in press).

Alterovitz G, Xiang M, Hill DP, Lomax J, Liu J, Cherkassky M, Mungall C, Harris MA, Dolan ME, Blake JA, Ramoni MF. 200_. Ontology Engineering. Nature Biotech, (submitted with revisions).

Altman RB, Bergman CM, Blake J, Blaschke C, Cohen A, Gannon F, Grivell L, Hahn U, Hersh W, Hirschman L, Jensen LJ, Krallinger M, Mons B, O'Donoghue SI, Peitsch MC, Rebholz-Schuhmann D, Shatkay H, Valencia A. 2008. Text mining for biology-the way forward: opinions from leading scientists. Genome Biol 9(Suppl 2):S7. PMC2559991

Blake JA, Harris MA. 2008. The Gene Ontology (GO) project: structured vocabularies for molecular biology and their application to genome and expression analysis. Curr Protoc Bioinformatics Chapter 7:Unit 7.2.

Bult CJ, Eppig JT, Kadin JA, Richardson JE, Blake JA; Mouse Genome Database Group. 2008. The Mouse Genome Database (MGD): mouse biology and model systems. Nucleic Acids Res 36(Database):D724-D728.

Drabkin HJ, Arighi C, Wu C, Blake JA. 2008. Functional Annotation of Protein Isoforms and Modified Forms. In: International Conference on Bioinformatics & Computational Biology. Arabnia HR, Yang MG, Yang JY eds. CSREA Press. pp. 701-707.

Gene Ontology Consortium. 2008. The Gene Ontology project in 2008. Nucleic Acids Res 36(Database):D440-D444.

Hill DP, Smith B, McAndrews-Hill MS, Blake JA. 2008. Gene Ontology annotations: what they mean and where they come from. BMC Bioinformatics 9(Suppl 5):S2. PMC2367625

Lovering RC, Camon EB, Blake JA, Diehl AD. 2008. Access to immunology through the Gene Ontology. Immunology 125(2):154-160.

Pena-Castillo L, [25 authors], Blake JA, [12 authors], Roth FP. 2008. A critical assessment of Mus musculus gene function prediction using integrated genomic evidence. Genome Biol 9(Suppl 1):S2. PMC2447536

Tasan M, Tian W, Hill DP, Gibbons FD, Blake JA, Roth FP. 2008. An en masse phenotype and function prediction system for Mus musculus. Genome Biol 9(Suppl 1):S8. PMC2447542

Diehl AD, Lee JA, Scheuermann RH, Blake JA. 2007. Ontology development for biological systems: immunology. Bioinformatics 23(7):913-915.

Eppig JT, Blake JA, Bult CJ, Kadin JA, Richardson JE; Mouse Genome Database Group. 2007. The mouse genome database(MGD): new features facilitating a model system. Nucleic Acids Res 35(Database):D630-D637.

Eppig JT, Blake JA, Bult CJ, Richardson JE, Kadin JA, Ringwald M; The MGI Staff. 2007. Mouse genome informatics (MGI) resources for pathology and toxicology. Toxicol Pathol 35(3):456-457.

Mouse Phenotype Database Integration Consortium. 2007. Integration of mouse phenome data resources. Mamm Genome 18(3):157-163.

Natale DA, Arighi CN, Barker WC, Blake J, Chang T-C, Hu Z, Liu H, Smith B, Wu CH. 2007. Framework for a protein ontology. BMC Bioinformatics 8(Suppl 9):S1.

Blake JA, Bult CJ. 2006. Beyond the data deluge: data integration and bio-ontologies. J Biomed Inform 39(3):314-320.

Blake JA, Eppig JT, Bult CJ, Kadin JA, Richardson JE; Mouse Genome Database Group. 2006. The Mouse Genome Database (MGD): updates and enhancements. Nucleic Acids Res 34(Database):D562-D567.

Dolan ME, Blake JA. 2006. Using ontology visualization to coordinate cross-species functional annotation for human disease genes. Proceedings. 19th IEEE International Symposium on Computer-Based Medical Systems pp. 583-587.

Gene Ontology Consortium. 2006. The Gene Ontology (GO) project in 2006. Nucleic Acids Res 34(Database):D322-D326.

Dolan ME, Ni L, Camon E, Blake JA. 2005. A procedure for assessing GO annotation consistency. Bioinformatics 21(Suppl 1):i136-i143.

Drabkin HJ, Hollenbeck C, Hill DP, Blake JA. 2005. Ontological visualization of protein-protein interactions. BMC Bioinformatics 6:29.

Eppig JT, Bult CJ, Kadin JA, Richardson JE, Blake JA; Mouse Genome Database Group*. 2005. The Mouse Genome Database (MGD): from genes to mice--a community resource for mouse biology. Nucleic Acids Res 33(Database):D471-D475.

Qi D, Blake JA, Kadin JA, Richardson JE, Ringwald M, Eppig JT, Bult CJ; Mouse Genome Informatics Group. 2005. Data integration in the mouse genome informatics (MGI) database. Workshops and Poster Abstracts. 2005 IEEE Computational Systems Bioinformatics Conference pp. 37-38.

Santos C, Blake J, States DJ. 2005. Supplementary data need to be kept in public repositories. Nature 439(7079):912.

Search Staff Bibliography Database

Related Topics

Postdoctoral training program
Get details on our award winning program and how to apply.

Learn more