Understanding the non-coding genome

The Jackson Laboratory's Ryan Tewhey standing with his lab staff. Photo credit: Tiffany LauferRyan Tewhey (foreground, middle) in his lab at The Jackson Laboratory with his staff. Photo credit: Tiffany Laufer

JAX's Assistant Professor Ryan Tewhey, Ph.D., is leading efforts to apply high-throughput analyses to understand both how non-coding genome regulation works and its many roles in disease. 

An odd thing about our genomes is that the vast majority—about 98.5 percent—doesn’t code for proteins. For many years researchers thought that most of the non-coding regions had no function, but as sequencing improved and data emerged, it became apparent that these billions of non-coding base pairs are important after all. But if they don’t code for proteins, which carry out most cellular tasks, what do they do?

Examining non-coding genome regulation

It turns out that there are vast genomic networks that regulate when the protein-coding genes are active, and how much they are transcribed. Variations in the regulatory network may affect both the timing and the amount of protein produced. Disruptions in regulation can have a profound impact on healthy function—in fact, most genomic variants associated with complex diseases in humans are found in non-coding regions.

The sheer number of possible non-coding variations and combinations make them extremely difficult to investigate. Researchers are therefore developing methods that allow them to quickly look at the effects of large numbers of DNA base variants. Jackson Laboratory (JAX) Assistant Professor  Ryan Tewhey, Ph.D.Identifying the precise genetic mechanisms for complex traits and disease risk Ryan Tewhey , Ph.D., is leading efforts to apply high-throughput analyses to understand both how non-coding genome regulation works and its many roles in disease. In a recent paper in Nature Genetics, “Prioritization of autoimmune disease-associated genetic variants that perturb regulatory element activity in T cells,” Tewhey and collaborators were able zero in on a non-coding variants associated with human autoimmune disease and investigate their effects in the laboratory.

From disease association to disruption of function

A team led by Tewhey, John Ray, Ph.D., of Benaroya Research Institute and Nir Hacohen, Ph.D., of the Broad Institute of MIT and Harvard started with human genome-wide association studies (GWAS), which identify variant locations within the genome associated with susceptibility to disease. GWAS provide a good starting point, but there can be dozens or even hundreds of variants implicated for every actual causal variant, so prioritizing them is essential. There are several methods used to pinpoint them, but each has limitations. The researchers therefore applied two methods in tandem to analyze GWAS-identified locations associated with autoimmune diseases such as type 1 diabetes, rheumatoid arthritis and inflammatory bowel disease. Massively parallel reporter assays (MPRA) test variants for their ability to affect gene expression, while chromatin accessibility data shows which areas of the genome are likely to be accessible to protein binding and involved with gene expression.

“Both MPRA and chromatin accessibility data are helpful for narrowing down the GWAS variants actually associated with disease, but alone they don’t provide sufficient precision,” says Tewhey. “By combining them, we were able to enrich our selection of the actual causal variants nearly 60-fold and identify single variants associated with multiple autoimmune diseases.”  

One variant they focused on, known as rs72928038, is found in a non-coding region of the genome, indicating that it’s part of the regulatory network. The researchers were able to engineer it into human T cells in vitro (in laboratory culture). T cells are central to immune function, and the research team found that the variant did indeed affect the expression of an important gene, BACH2, in them. BACH2 is an important regulator of T cell differentiation into different subclasses with different functions, and the variant reduced its expression levels.

A notable change in mice

Because the immune system and diseases that involve it can involve the complex interactions of many different cell types, the researcher used the mouse as a model system to understand how a regulator of BACH2 may influence disease progression. The region containing the variant is highly conserved—very similar—between humans and mice, allowing the researchers to create mice with a small deletion in their genomes that included the variant location. In the mice, too, they found reduced Bach2 expression. In addition, their results in mice indicated that the rs72928038 variant plays an important role in suppressing naive T cell activation, meaning that the T cells are activated too early and before being directed toward a specific pathogen or non-tissue target. That finding likely explains why the variant is associated with autoimmune diseases, in which T cells attack patients’ own tissues.

“Not many mouse models have been engineered to study non-coding variants associated with disease,” says Tewhey. “It’s risky, because the effect size can be so small in isolation that it doesn’t change the physiology of the mouse. The deletion we made for the study only reduced Bach2 gene expression by about 20 percent, but fortunately it produced the autoimmune phenotype. The results provide a good proof point for expanding the use of mice to investigate non-coding variants associated with human disease identified by GWAS.”

While addressing gene mutations and protein dysfunction is obviously vital for developing therapies for many diseases, about 90 percent of the variants in GWAS data for common complex diseases are non-coding. Researchers are just beginning to explore the non-coding regulatory mechanisms, and the functional consequences of specific variants is almost entirely unknown. This study therefore represents an exciting step forward in genomic research, spanning the gap between human GWAS data and experimental exploration of a non-coding variant implicated in disease.