The advent of genome-scale biology has provided biologists with enormous amounts of data to analyze, understand and incorporate into ever-improving models of how organisms function at a molecular level. A mammalian genome is comprised of some tens of thousands of genes and an equal or greater number of additional functional elements. Our research centers on an improved understanding of the interactions between these elements and how modification or disruption of the interactions can lead to developmental problems or disease. Our studies are conducted at both a global level, characterizing complex interaction networks, and also at smaller scales, where we focus on processes that affect the processing and regulation of specific genes or groups of genes at the mRNA transcript stage of expression.
Integrated Studies of Gene Regulation and Genome Organization
Probe-level microarray analysis reveals functional differences between transcript isoforms
We recently developed a new analysis to identify differences in mRNA processing between pairs of samples, based on a probe-level analysis of microarray data. Standard microarrays typically assay gene expression via hybridization to a set of 10-15 probes that target distinct positions within its mRNA transcript, measuring the hybridization signal at each probe. The typical data analysis for these arrays includes a summarization of the signal at all probes into a single expression level for the gene in each sample. Our analysis eliminates the summarization, instead focusing on non-uniform changes in signal across the probeset, and specifically identifying the boundaries between contiguous groups of probes with distinct expression changes. We interpret these blocks as a change in the relative abundance of isoforms that are sampled by the probes, however, and as shown below, our analysis has also provided evidence of mRNA cleavage events. Our work to date has focused primarily on the Affymetrix Mouse 430 Gene expression chip, which due to its oligo-dT cDNA priming has probes targeted primarily to the 3'-end of the gene, specifically including the 3'-untranslated region (3'-UTR). The position bias of the probes makes alternative polyadenylation, with accompanying changes to the 3'-UTR, the most likely identified phenomenon.
Working with microarray data provided by Dr. John Eppig of The Jackson Laboratory, we applied our analysis to the analysis of transcript degradation during the transition from Germinal Vesicle (GV) oocytes to Metaphase II (MII) arrested, ovulated oocytes. The analysis revealed several hundred genes with multiple isoforms of differential stability in the GV-MII transition, including genes where an extended 3'-UTR correlates with increased stability, and vice versa. Further details of the degradation were revealed through a comparison of microarrays hybridized with MII cDNAs generated from random and oligo-dT primers, respectively. Several hundred genes showed processing changes that could be detected only with random-primed cDNA, indicating the absence of a polyA tail at the 3'-end of the transcript. Sequence analysis of a sample of the genes with such processing revealed a distinct targeting sequence consistent with the seed region of two microRNAs that are expressed in the oocyte.
Systematic changes in mRNA processing characterize tumors with distinct prognoses
Working in collaboration with Dr. Kevin Mills of The Jackson Laboratory, we recently demonstrated that the gene expression changes during tumorigenesis in progenitor-B-cells from mouse models of human cancer include widespread and systematic changes in mRNA processing. Our results were obtained through application of the probe-level microarray analysis described above. An internal cross-validation analysis showed that histologically indistinguishable tumors with different prognostic outcomes could be classified with up to 80% accuracy. Western blot analysis of selected genes demonstrated that changes in transcript isoform could be correlated with changes in protein expression.
Existing models of tumorigenesis associated with genomic instability focus largely on the identification of amplified oncogenes and the downstream effects of the associated increase in expression of these genes. Our work provides an important caveat to these studies, in that two of the models that we study share a common amplified oncogene (c-myc), yet have different prognoses. Critically, our analysis of these models, focused on changes in mRNA processing, reveals characteristic differences that were missed with a standard summarized microarray analysis. Our work suggests a determining role for the cell's response to DNA damage, both in the type and amount of such damage. This work is consistent with, but also extends, previous studies that revealed changes in mRNA processing associated with DNA damage.
Our work provides an important companion to the many studies now being performed and published on the importance of changes in microRNA expression in tumorigenesis. Our work supports a role for systematic changes in selection of polyA site, and accordingly 3'-UTR, during tumorigenesis. Since microRNAs predominantly target 3'-UTRs, changes in polyA site selection have the potential to control whether or not a transcript is targeted by the microRNAs with altered expression. Indeed, avoidance of microRNA-mediated regulation has the potential to contribute to the cell-level selection necessary for tumorigenesis.
Exploring non-uniformity in gene distribution on the mouse genome: gene deserts
We are interested in non-uniformities of the distribution of protein-coding genes along the chromosomes of the mouse genome. Of specific interest are the largest gaps, commonly referred to as "gene deserts." Our work includes the development of a novel means of defining the gene deserts based on a dynamic programming (DP) analysis of the local gene density. The DP basis of our analysis provides a robust, rigorous means of identifying deserts even in the event of changing genomic annotations. The deserts defined this way are based on a local gene density rather than an absolute absence of genes and therefore have the intriguing capacity to include genes within the regions defined as deserts. Studies of overlap with other block-like features (e.g., syntenic blocks, linkage disequilibrium blocks) have shown statistically significant intersections, but only on specific individual chromosomes.
Our studies of the genes located within (and near) the gene deserts reveals statistical enrichment for functions related to the related broad classes of cell-to-cell communication, cell-to-cell adhesion, and, strikingly, neurogenesis. In particular, the communication and adhesion genes include a large number from the broad cadherin family of genes, which includes genes that are found in both desert and non-desert contexts within the genome. Nearly complete studies within these families are assessing the role of genomic context in specifying the breadth of expression patterns among gene family members.
Tracking and cataloguing variation in the mouse genome: CGDSNPdb
We have recently released the Center for Genome Dynamics Single Nucleotide Polymorphism database (CGDSNPdb), a high quality, open-source, curated Mouse SNP database with more than 8 Million SNPs, drawn from multiple sources, from 74 inbred strains of laboratory mice. All SNPs have been checked for quality control and annotated for properties specific to the SNP as well as those implied by changes to overlapping protein-coding genes. CGDSNPdb also serves as the interface tool to the "CGD imputed SNP resource" in which a Hidden Markov Model (HMM) was used to assess local haplotypes at several million genomic loci in tens of strains of mice. The imputed SNP calls may be searched, retrieved and analyzed identically to the experimentally verified SNPs, with additional information, such as HMM likelihood score provided in the query return. CGDSNPdb is accessible online via a web-based query tool (http://www.genomedynamics.org/cgdsnpdb/) and also via a mySQL public login. The search engine facilitates a number of different queries, including search by chromosome region(s), nearby gene annotations or SNP accession ID. Results can be returned in multiple popular formats.
Principal Investigator: Joel H. Graber, Ph.D.
Scientific Software Engineer: Lucie Hutchins, B.S.
Software Engineers: Nazira Bektassova, B.S., Matt Vincent, B.S.
Collaborators: Carol J. Bult, Ph.D., Gary A. Churchill, Ph.D., Wilhelmine de Vries, Ph.D., John J Eppig, Ph.D., Wayne N. Frankel, Ph.D., Barbara B. Knowles, Ph.D., Kenneth Paigen, Ph.D., Anne E. Peaston, Ph.D., Petko M. Petkov, Ph.D., Kevin D Mills, Ph.D., Clifford J. Rosen, M.D., Lindsay S. Shopland, Ph.D.,Thomas Blumenthal, Ph.D., University of Colorado, Boulder, Keith W. Hutchison, Ph.D., University of Maine, Orono, Clinton C. MacDonald, Ph.D., Texas Tech University, Janet Rowley, M.D., University of Chicago
Research Administrative Assistant: Tonnya Norwood, B.S.
Tian B, Graber JH. 2012. Signals for pre-mRNA cleavage and polyadenylation. Wiley Interdiscip Rev RNA 3(3):385-396.
Billings T, Sargent EE, Szatkiewicz JP, Leahy N, Kwak IY, Bektassova N, Walker M, Hassold T, Graber JH, Broman KW, Petkov PM. 2010. Patterns of recombination activity on mouse chromosome 11 revealed by high resolution mapping. PLoS One 5(12):e15340. PMC2999565
Hutchins LN, Ding Y, Szatkiewicz JP, Von Smith R, Yang H, de Villena FP, Churchill GA, Graber JH. 2010. CGDSNPdb: a database resource for error-checked and imputed mouse SNPs. Database (July 6)baq008. PMC2911843
Kim H, Erickson B, Luo W, Seward D, Graber JH, Pollock DD, Megee PC, Bentley DL. 2010. Gene-specific RNA polymerase II phosphorylation and the CTD code. Nat Struct Mol Biol 17:1279-1286. PMC3048030
Peaston AE, Graber JH, Knowles BB, de Vries WN. 2010. Interrogating the transcriptome of oocytes and preimplantation embryos. Method Enzymol 477:481-510.
Salisbury J, Hutchison KW, Wigglesworth K, Eppig JJ, Graber JH. 2009. Probe-level analysis of expression microarrays characterizes isoform-specific degradation during mouse oocyte maturation. PLOS One, 4(10):e7479. PMC2759528
Yang H, Ding Y, Hutchins LN, Szatkiewicz J, Bell TA, Paigen BJ, Graber JH, de Villena FP, Churchill GA. 2009. A customized and versatile high-density genotyping array for the mouse. Nat Methods 6(9):663-666. PMC2735580
Singh P, Alley TL, Wright SM, Kamdar S, Schott W, Wilpan RY, Mills KD, Graber JH. 2009. Global changes in processing of mRNA 3' untranslated regions characterize clilnically distinct cancer subtypes. Cancer Res 69(24):9422-9430. PMC2794997
De Vries WN, Evsikov AV, Brogan LJ, Anderson CP, Graber JH, Knowles BB, Solter D. 2008. Reprogramming and Differentiation in Mammals: Motifs and Mechanisms. Cold Spring Harb Symp Quant Biol 73. PMC2735112
Hutchins LN, Murphy SM, Singh P, Graber JH. 2008. Position-dependent motif characterization using non-negative matrix factorization. Bioinformatics 24(23):2684-2690. PMC2639279
Paigen K, Szatkiewicz JP, Sawyer K, Leahy N, Parvanov ED, Ng SH, Graber JH, Broman KW, Petkov PM. 2008. The recombinational anatomy of a mouse chromosome. PLoS Genet 4(7):e1000119. PMC2440539
Graber JH, Salisbury J, Hutchins LN, Blumenthal T. 2007. C. elegans sequences that control trans-splicing and operon pre-mRNA processing. RNA 13(9):1409-1426. PMC1950753
Liu D, Brockman JM, Dass B, Hutchins LN, Singh P, McCarrey JR, MacDonald CC, Graber JH. 2007. Systematic variation in mRNA 3'-processing signals during mouse spermatogenesis. Nucleic Acids Res 35(1):234-246. PMC1802579
Petkov PM, Graber JH, Churchill GA, DiPetrillo K, King BL, Paigen K. 2007. Evidence of a large-scale functional organization of mammalian chromosomes. PLoS Biol 5(5):e127.
Brown AC, Lerner CP, Graber JH, Shaffer DJ, Roopenian DC. 2006. Pooling and PCR as a method to combat low frequency gene targeting in mouse embryonic stem cells. Cytotechnology 51(2):81-88.
Evsikov AV, Graber JH, Brockman JM, Hampl A, Holbrook AE, Singh P, Eppig JJ, Solter D, Knowles BB. 2006. Cracking the egg: molecular dynamics and evolutionary aspects of the transition from the fully grown oocyte to embryo. Genes Dev 20:2713-2727.
Graber JH, Churchill GA, Dipetrillo KJ, King BL, Petkov PM, Paigen K. 2006. Patterns and mechanisms of genome organization in the mouse. J Exp Zoolog 305A(9):683-688.
Liu D, Graber JH. 2006. Quantitative comparison of EST libraries requires compensation for systematic biases in cDNA generation. BMC Bioinformatics 7:77.
Salisbury J, Hutchison KW, Graber JH. 2006. A multispecies comparison of the metazoan 3'-processing downstream elements and the CstF-64 RNA recognition motif. BMC Genomics 7:55.
Brockman JM, Singh P, Liu D, Quinlan S, Salisbury J, Graber JH. 2005. PACdb: PolyA cleavage site and 3'-UTR database. Bioinformatics 21:3691-3693.
Petkov PM, Graber JH, Churchill GA, DiPetrillo KJ, King BL, Paigen K. 2005. Evidence of a large-scale functional organization of mammalian chromosomes. PLoS Genet 1(3):e33.