Our laboratory investigates gene regulatory networks in cellular processes using computational and statistical approaches. We develop quantitative models and statistical learning methods for genomics. It involves analyzing "big data" from cutting-edge sequencing technologies and integrating various types of high-throughput genomic datasets. We have a particular interest in modeling genome regulation, including but not limited to transcription, noncoding RNA regulation, and chromatin organization.
Integrative transcription modeling
With the vast development of next generation sequencing technologies, quantitative modeling of gene regulation at the genome-level is becoming intriguing. To understand how much gene expression variation across the genome is explained by transcription factor binding, we developed the first integrative model for joint analysis of ChIP-Seq and RNA-Seq data (Ouyang Z, Zhou Q, and Wong WH, PNAS 2009). The TF-gene association strength was defined by summing the binding peaks weighted by their intensities and the distances to transcription start sites. We then used principal component analysis and variable selection to predict genome-wide gene expression using combinations of TFs. The model effectively captures combinatorial relationships among TFs. Applying the model to mouse embryonic stem cells for the first time, we found the binding signals of 12 sequence-specific TFs have remarkably high predictive power on absolute mRNA abundance measured from RNA-Seq (r = 0.806). The model revealed combinatorial gene regulation, with some TFs acting mainly as activators and others acting as either activators or repressors depending on the context. Ongoing research includes developing a more comprehensive framework for transcription regulation by integrative statistical modeling.
Decoding RNA regulatory information
High-throughput sequencing is greatly advancing our understanding on the regulation of RNAs, especially for the large set of functionally uncharacterized noncoding RNAs. RNA regulatory information is embedded not only in the primary sequences, but also within their structures. High-throughput sequencing coupled with nuclease digestion is emerging to dissect the structures of thousands of RNAs simultaneously. We developed a computational method for genome-scale reconstruction of RNA structure integrating sequencing data (Ouyang Z, Snyder MP and Chang HY, Genome Research 2012). It incorporates sequencing signals in a high-dimensional classification framework to select stable structure models from the Boltzmann ensemble. Testing over a wide range of mRNAs and noncoding RNAs, our method was demonstrated to be more accurate and robust than traditional approaches based on free energy minimization. This was the first time that high-throughput sequencing was proved to be useful for accurate RNA structure reconstruction. Using the reconstructed RNA structure models of yeast and mammalian transcriptomes, we uncovered the diverse impact of RNA structure on translation efficiency, transcription initiation, and protein-RNA interactions. We are further investigating RNA regulation using sequence and structure information systematically.
Gene regulatory network reconstruction
Cell fate maintenance and transition are controlled by complex gene interactions. Cell-type specific gene expression patterns suggest the dynamics of gene regulatory networks. The increased depth of genomic profiling provides opportunities to more comprehensively reconstruct gene regulatory networks and study their dynamic properties. We are interested in quantitative description and statistical inference of gene regulatory networks from high-throughput genomic data. We are also interested in gene regulatory networks at different layers, such as chromatin and epigenetic regulation. We are developing and applying methods to infer gene regulatory networks in model systems.
Postdoctoral Associates: Chenchen Zou, Ph.D.; Yizhou Li, Ph.D.
Wan Y, Qu K, Ouyang Z, and Chang HY (2013) Genome-wide mapping of RNA structure using nuclease digestion and high throughput sequencing. Nature Protocols, 8:849-869.
Ouyang Z†, Snyder MP, and Chang HY† (2012) SeqFold: Genome-scale reconstruction of RNA secondary structure integrating high-throughput sequencing data. Genome Research. Advance Online Access October 11, 2012. †co-corresponding authors.
Wan Y, Qu K*, Ouyang Z*, Kertesz M, Li J, Tibshirani R, Nutter RC, Segal E, and Chang HY (2012) Genome-wide measurement of RNA folding energies. Molecular Cell, 48:1-13. *equal contribution.
Gerstein MB, ..., Ouyang Z, ..., and Snyder MP (2012) Architecture of the human regulatory network derived from ENCODE data. Nature, 489:91-100.
ENCODE Project Consortium, ..., Ouyang Z, ..., Birney E (2012) An integrated encyclopedia of DNA elements in the human genome. Nature, 489:57-74.
Heffelfinger C, Ouyang Z, Engberg A, Leffell DJ, Hanlon AM, Gordon PB, Zheng W, Zhao H, Snyder MP, Bale AE (2012) Correlation of global microRNA expression with basal cell carcinoma subtype. G3: Genes, Genomes, Genetics, 2:279-86.
Pan Y, Ouyang Z, Wong WH, Baker JC (2011) A new FACS approach isolates hESC derived endoderm using transcription factors. PLoS ONE, 6, e17536.
Ouyang Z, Zheng GX, Chang HY (2010) Noncoding RNA landmarks of pluripotency and reprogramming. Cell Stem Cell, 7:649-50.
Lee EY, Ji H, Ouyang Z, Zhou B, Ma W, Vokes SA, McMahon AP, Wong WH, Scott MP (2010) Hedgehog pathway-regulated gene networks in cerebellum development and tumorigenesis. Proceedings of the National Academy of Sciences USA, 107, 9736-41.
Ouyang Z, Zhou Q, Wong WH (2009) ChIP-seq of transcription factors predicts absolute and differential gene expression in embryonic stem cells. Proceedings of the National Academy of Sciences USA, 106:21521-6.
Xing Y, Ouyang Z, Kapur K, Scott MP, Wong WH (2007) Assessing the conservation of mammalian gene expression using high-density Exon Arrays. Molecular Biology and Evolution, 24:1283-1285.
Kapur K, Xing Y, Ouyang Z, Wong WH (2007) Exon arrays provide accurate assessments of gene expression. Genome Biology, 8:R82.
Li E, Ouyang Z, Deng X, Zhang Y, Chen W (2005) Parallel implementation of SEMPHY-a structural EM algorithm for phylogenetic reconstruction. Parallel Computing'05, 631-8.
Ouyang Z, Liu JK, She ZS (2005) Hierarchical structure analysis describing abnormal base composition of genomes. Physical Review E, 72, 041915.
Zhu H, Hu G, Ouyang Z, Wang J, and She ZS (2004) Accuracy improvement for identifying translation initiation sites in microbial genomes. Bioinformatics, 20, 3308-17.
Ouyang Z, Zhu H, Wang J, She ZS (2004) Multivariate entropy distance method for prokaryotic gene identification. Journal of Bioinformatics and Computational Biology, 2, 353-73.
Ouyang, Z., C. Wang, and Z.-S. She (2004) Scaling and hierarchical structures in DNA sequences. Physical Review Letters, 93, 078103.
She ZS, Yang Z, Ouyang Z, Zhu H, Wang C, and Yin J (2003) A preliminary study to the origin and evolution of SARS-CoV. Acta Scientiarum Naturalium Universitatis Pekinensis, 39, 809-14.