We develop statistical and computational methodologies for chromatin and RNA genomics. Of particular interest includes but is not limited to genome regulation of gene expression, long noncoding RNA and epigenetic modulation, chromatin interaction and genome structure, and regulatory networks, etc. We apply our approaches to cancer, stem cells, differentiation, and development.
With the vast development of next generation sequencing technologies, quantitative modeling of gene regulation at the genome-level is becoming intriguing. To understand how much gene expression variation across the genome is explained by transcription factor binding, we developed the first integrative model for joint analysis of ChIP-Seq and RNA-Seq data (Ouyang Z, Zhou Q, and Wong WH, PNAS 2009). The TF-gene association strength was defined by summing the binding peaks weighted by their intensities and the distances to transcription start sites. We then used principal component analysis and variable selection to predict genome-wide gene expression using combination of TFs. The model effectively captures combinatorial relationships among TFs. Applying the model to mouse embryonic stem cells, for the first time, we found the binding signals of 12 sequence-specific TFs have remarkably high predictive power on absolute mRNA abundance measured from RNA-Seq (r = 0.806). The model revealed combinatorial gene regulation, with some TFs acting mainly as activators, while others acting as either activators or repressors depending on the context. Ongoing research includes developing a more comprehensive framework for transcription regulation by integrative statistical modeling.
High-throughput technologies are greatly advancing our understanding on the regulation of RNAs, especially for the large set of functionally uncharacterized noncoding RNAs. RNA regulatory information is embedded not only in the primary sequences, but also within their structures. High-throughput sequencing coupled with nuclease digestion is emerging to dissect the structures of thousands of RNAs simultaneously. We developed a computational method for genome-scale reconstruction of RNA structure integrating sequencing data (Ouyang Z, Snyder MP and Chang HY, Genome Research 2012). It incorporates sequencing signals in a high-dimensional classification framework to select stable structure models from the Boltzmann ensemble. Testing over a wide range of mRNAs and noncoding RNAs, our method was demonstrated to be more accurate and robust than traditional approaches based on free energy minimization. This was the first time that high-throughput sequencing was proved to be useful for accurate RNA structure reconstruction. Using the reconstructed RNA structure models of yeast and mammalian transcriptomes, we uncovered the diverse impact of RNA structure on translation efficiency, transcription initiation, and protein-RNA interactions. We are further investigating RNA regulation using sequence and structure information systematically.
Cell fate maintenance and transition are controlled by complex gene interactions. Cell-type specific gene expression patterns suggest the dynamics of gene regulatory networks. The increased depth of genomic profiling provides opportunities to more comprehensively reconstruct gene regulatory networks and study their dynamic properties. We are interested in quantitative description and statistical inference of gene regulatory networks from high-throughput genomic data. We are also interested in gene regulatory networks at different layers, such as chromatin and epigenetic regulation. We are developing and applying methods to infer gene regulatory networks in model systems.