The correct sequence of DNA bases is important, but not sufficient, for proper genomic function. In fact, the regulation of genomic activity, particularly which genes are expressed at what times, involves many more factors. For example, it’s now known that the three-dimensional (3D) organization of chromatin (the complex of DNA, proteins and chemicals that make up chromosomes) can provide physical access to genes (open chromatin) or block them (closed chromatin), increasing and decreasing their expression levels respectively.
Advances in 3D genome methodology have provided researchers with the ability to map chromatin looping and interactions in unprecedented detail. The results have shown that beyond classifying chromatin simply into open or closed compartments, important sub-compartments within the larger groups exist. Using this information, Jackson Laboratory (JAX) Sheng Li, Ph.D.Applies data integration and machine learning to advance the frontiers of cancer epigenomics and evolution. Assistant Professor Sheng Li, Ph.D.,
and her team developed a new computational algorithm, called Sub-Compartment Identifier (SCI), to automatically and accurately predict sub-compartment locations and types within the genome.
Presented in “Graph embedding and unsupervised learning predict genomic sub-compartments from Hi-C chromatin interaction data,” published in Nature Communications, SCI uses data from Hi-C, a method that profiles chromatin interactions across the genome. Research has shown that sub-compartment structures are highly variable among different cell types—implying variable functions—so accurate identification of sub-compartments is becoming increasingly important in genomic research. SCI accurately identifies complex nuclear sub-compartments in a data-driven, unsupervised fashion.
In evaluation, SCI outperformed previously developed algorithms in both identifying and classifying sub-compartments. The team also examined the functions of the predicted sub-compartments, including the epigenetic (chemical) modifications associated with active gene expression versus those associated with lower gene expression. They assessed the RNA levels themselves as well, indicative of gene expression levels. Their findings show that SCI accurately predicted the sub-compartments within the larger open and closed chromatin groups. Their work also provided insight into the characteristics of the different sub-compartments and provides the ability to better assess their distinct regulatory roles.
New research initiatives are generating robust Hi-C datasets, increasing the need for efficient analysis tools such as SCI. The findings will expand our knowledge of the complex interactions between chromatin organization, gene expression and other factors, such as epigenetic marks. Ultimately, the goal is better understanding of the layers of gene-expression coordination that are important for development and disease states.