Blog Post September 24, 2012

Human genomes and 3-D structure: going beyond the DNA sequence


The 32 letters above represent the sort of data you get when you sequence .000001% of a human genome. And that’s the way we tend to think of our genomes, as linear strings of letters. At first glance, therefore, it would make sense that the terminal G is spatially farther away from the first T than a G in the middle is. The reality is often very different, however.

The three-dimensional structure of a genome is incredibly convoluted, as it has to be to fit six feet or so of DNA (I saw ten feet in some reports about ENCODE, though I’m not sure where the number came from) into the microscopic cell nucleus. DNA winds around proteins called histones, which forms structures called nucleosomes (image source: The Pennsylvania State University Department of Biochemistry and Molecular Biology), which are organized into higher-level structures that ultimately comprise complete chromosomes. The various structures of the DNA-protein complex, called chromatin, serve several functions, including the compacting of DNA needed to fit it into the nucleus and the control of gene expression and DNA replication. All the winding around also means that sequences that are thousands and even millions of base pairs apart in a sequence can be physically located right next to each other.

The idea that the three-dimensional structure of DNA is important to its function is hardly new. For obvious reasons, however, mapping where each segment is physically located within the nucleus, which elements are physically proximal, and how they interact is a difficult task. And it adds another layer of difficulty onto the already gargantuan task of understanding our genomes so that we can take action in beneficial and practical ways.

It was therefore of great interest to me that the recently released ENCODE data included large-scale mapping of how various distal elements, far away sequentially from a gene’s location, can affect transcription of genes. While most of the media play focused on the activity of non-coding regions (which is related to the 3-D structure story), learning how elements that are distant sequentially are brought together physically—and the interactions between them—is one of the important genomic research frontiers for which ENCODE should provide a boost.

Granted, the ENCODE data isn’t really sexy. The abstract for the most comprehensive structural analysis,The long-range interaction landscape of gene promoters by Sanyal and Lajoie et al published in Nature, makes clear that it’s just a beginning: “Our results start to place genes and regulatory elements in three-dimensional context, revealing their functional relationships.” But their preliminary work provides important information. First, it had been assumed that distal regulatory sites often interact with the gene closest to them, whereas the opposite is true—only 7% do, according to the new data. Second, distance does matter, and there’s a bias for interactions between elements 120,000 bases apart. Also, long-range interactions are not blocked by the presence of proteins in between that are known to “insulate” genes from promoter or enhancer activity. Other ENCODE papers that looked at distal regulatory elements in other contexts corroborated these findings. All of these data will help researchers get a much better start on their research into 3-D structure moving forward.

Another element of 3-D structure is where the sequence is physically located within the nucleus. Research by Lindsay Shopland, Ph.D., at The Jackson Laboratory has shown details of how nuclear localization affects gene expression. For example, Shopland found that the nuclear periphery, thought of as a silencing compartment for gene expression, actually associates with active genes in mouse chromosomes. She is also assessing chromosomal rearrangements in lymphoma cells and how they affect chromatin 3-D folding and gene expression.

With genomics, the sequence is vital, but it may also be just the starting point. So in addition to epigenetics, the microbiome, environment, behavior and other frequently mentioned factors, we need to understand how sequence elements interact within the larger 3-D structure and how their physical location in the nucleus affects their activity and function. We can now only guess at the importance of structure for genome function, but my hunch is that it will turn out to be significant. The ENCODE data provides a nice boost on the road to better comprehension, both in terms of data presented and the acknowledgment of its importance.