Determining the function of genomic regulatory elements

JAX assistant professor Ryan Tewhey writing on a board while wearing PPE

When it comes to the mammalian genome, it’s intuitive to focus on the small fraction of the sequence—roughly 1.5 percent—that actually codes for proteins. A surprising thing happened when researchers began looking for genetic variants associated with particular diseases, however. 

A large majority of disease-associated variants are in the non-coding genomic regions that are now understood to help regulate gene expression, meaning when a gene is active, and how much it is transcribed (to produce mRNA and, ultimately, its protein product) at a given time. Therefore, variations and disruptions of the regulatory networks within the genome likely contribute more to many diseases than mutations in the protein-coding genes themselves.

That said, the effort to understand the general roles of non-coding variants carries with it a significant challenge. There are many millions of locations where non-coding variants may affect function, and a vastly greater number of potential variant combinations. Figuring out how it all works is an immense task, and one that is still in its early stages. In this context, a recent paper in Nature Genetics, “Direct characterization of cis-regulatory elements and functional dissection of complex genetic associations using HCR-FlowFISH,” represents an important step forward for the field.

Utilizing HCR-FlowFISH to characterize CREs

The team, led by Jackson Laboratory (JAX) Assistant Professor  Ryan Tewhey, Ph.D.Identifying the precise genetic mechanisms for complex traits and disease risk Ryan Tewhey , Ph.D., Harvard University Professor and Broad Institute member Pardis Sabeti, Ph.D., and Steven Reilly, Ph.D., who will be starting his laboratory at Yale University this fall, is working to identify and characterize the function of cis-regulatory elements (CREs) as part of a multi-year NIH/NHGRI ENCODE Functional Characterization Center study. CREs are non-coding genomic sequences that control the timing and/or levels of gene expression. More than 900,000 candidate CREs have been identified based on specific attributes they possess, such as the binding of specific proteins or their accessibility within the genome, but direct experimental evidence of how—or even whether—they regulate gene expression is lacking for almost all of them.

To better characterize large numbers of CREs, the team developed a process called HCR-FlowFISH that provides measurements of transcription abundance (and therefore levels of gene expression) following perturbation of CREs using CRISPR interference (CRISPRi) methods. HCR-FlowFISH is a broadly applicable approach to characterizing complex regulatory interactions in nearly any expressed gene and cell system. For the paper, the researchers used HCR-FlowFISH to characterize more than 300,000 perturbations, revealing some CREs are shared by multiple genes, as well as the likely target gene(s) of specific genetic variants with functional effects. Importantly, HCR-FlowFISH provides an improved signal-to-noise ratio compared to prior methods, and thus a better detection threshold for low-abundance transcripts.

“We are very excited at the ability of HCR-FlowFISH to directly measure the effect all CREs within a region of the genome have on the transcription of neighboring genes,” says Tewhey. “The ability to directly observe and quantify a CREs effect is an important step in interpreting these non-coding regions of the genome. We hope this sensitive and robust method will help researchers studying these regions and the genetic cause of complex disease.”

In addition to large-scale analyses, the researchers quantified CRE activity for specific genes to show the complex regulatory landscape present. Investigating the non-coding sequences surrounding four genes, FADS1, FADS2, FADS3 and FEN1, they were able to use HCR-FlowFISH to locate the promoters and CREs for each. They identified CREs that affected genes both individually and in combination, and one CRE 18,000 base pairs downstream of the FADS2 promoter actually increases expression across all four genes. The group also showed that a variant associated with cholesterol and lipoprotein levels in the general population impacts gene transcription and linked the variant to expression to FADS1 and FADS2.  The results underscore the complexity of the regulatory interactions, the need for experimental characterization of CREs, and how HCR-FlowFISH can be applied to connect genetic associations to specific genes.