Systems genomicist Zhengqing Ouyang, Ph.D., is pioneering new ways to sort through the ‘big data’ in genomics, casting light on some of the most important — and previously overlooked — aspects of the human genome.
With mind-blowing speed, it is now possible to read the information that lies embedded within the human genome. Yet translating this information into meaningful knowledge, lending insight into how cells develop and how the human body is made vulnerable to disease, requires statistical and computational methods that are just as powerful as the genome sequencing technologies themselves. This is the challenge of the emerging field of genomic data sciences — to make sense of the genome’s “big data” and to blow open the bottleneck that separates technological speed from analytical might.
“The speed of data generation has increased tremendously in the past fifteen years, but the way we process and analyze these data has not kept pace,” says Zhengqing Ouyang, Ph.D., an assistant professor at The Jackson Laboratory. “So, there is a real gap.”
Ouyang aims to close this gap. Based at the JAX Genomic Medicine facility in Farmington, Connecticut, his laboratory is developing statistical and computational methods that help interpret the data flowing from genome sequencing as well as other genome-scale technologies. Just as meteorologists collect measurements of different variables in the atmosphere to develop climate models and forecast the weather, Ouyang and his colleagues gather various measurements from thousands of locations within the genome to create models that help predict how key parts of the genome function.
In addition to honing these analytical tools, he and his colleagues are also applying them to some of the thorniest questions in genomics, particularly those that surround DNA’s chemical cousin, RNA. Long considered to be a mere messenger — carrying the instructions contained within DNA, sequestered in the nucleus, to other, more far-flung compartments in the cell — it has become increasingly clear that RNA is much more versatile than scientists once imagined. Now, through Ouyang’s work and others’, a vast molecular universe, controlled by RNA, is coming into view. Exactly how this universe behaves is still unknown, but there are some early hints that it could play important roles in human disease.
“Zhengqing is a deep thinker, he takes the time to really listen to the different sides of a scientific story,” says Charles Lee, scientific director of JAX-GM. “He brings three critical skills to JAX — broad computational and statistical knowledge, expertise in RNA biology, and the ability to think about analyses at a genome-wide level.”
As an undergraduate at Peking University in Beijing, China, Ouyang was captivated by the work of the Human Genome Project (HGP). Launched in 1990, the HGP was a sweeping, international effort to decode the human genetic blueprint. After about a decade of work and a cost of roughly $3 billion, the project reached a historic milestone: a first draft sequence of the human genome.
Although the HGP blazed new trails in biomedical research, Ouyang was even more fascinated by the informatics needs it brought to light. As the project unfolded and scientists sifted through piles of genome sequence data, it became clear that new analytical strategies were needed to make sense of the endless strings of As, Gs, Cs, and Ts emerging from the DNA sequencing machines. For example, what is the best — and fastest — way for snippets of DNA sequence to be assembled together, like pieces in a jigsaw puzzle, into a continuous whole? And how do researchers mine that full genome to unearth its key working parts, such as genes?
As an undergrad, Ouyang gained a strong background in quantitative disciplines, including mechanics and applied mathematics. With the support of the Taizhao Grant for undergraduate research, he began to apply his quantitative skills to the human genome (and many other organisms’ genomes) in the laboratory of Dr. Zhen-Su She. He was particularly interested in the quantitative modeling of biological systems, and went on to study in the interdisciplinary bioinformatics program in Peking University’s Center for Theoretical Biology (now the Center for Quantitative Biology), learning from experts in biology, chemistry, physics, and statistics.
Recognizing the importance of statistical modeling in genomic research, Ouyang joined Dr. Wing Hung Wong’s laboratory at Stanford University for his graduate studies. There, he studied embryonic stem cells and a group of specialized proteins called transcription factors. These proteins, long recognized for their regulatory roles, bind to distinct sites within the genome to control the turning on and turning off of genes (a process known as “gene expression”). Through this work, Ouyang developed a model that revealed that the expression levels of tens of thousands of genes can be accurately predicted based on the genomic binding sites of only a dozen transcription factors. This discovery led to a predoctoral training grant from the California Institute for Regenerative Medicine, allowing him to further explore gene regulatory networks in embryonic stem cells.
Despite Ouyang’s in-depth work on transcription factors, he began to appreciate that they are not the only tricks up cells’ sleeves. That is, other regulators, including ones made of RNA, can play important roles, too.
An RNA world
The conventional view of RNA as messenger stems from a core principle in biology called the “central dogma.” First described in the late 1950’s, it postulates that genetic information flows from DNA to RNA to proteins. Although the central dogma has largely weathered the test of time and scientific inquiry, it has become clear over the last decade that RNA is a not simply a bystander, but a key player in the cellular machinery.
After completing his graduate training in 2010, Ouyang set out to explore these overlooked regulators as a postdoctoral fellow in the laboratories of Dr. Howard Chang and Dr. Michael Snyder at Stanford University.
Unlike DNA, which exists in cells as a two-stranded helix, RNA is single-stranded, but often pairs up with itself, forming secondary structures such as loops and hairpins. These secondary structures give RNA molecules characteristic shapes, which in turn can influence function. In some cases, changes to these secondary structures — that is, changes in RNA shape — can lead to disease. For instance, when mutations in the ferritin light chain gene land in a certain region, they can disrupt its RNA structure, leading to excessively high levels of the gene — and an eye disease known as hyperferritinemia-cataract syndrome.
Typically, RNA shapes have been determined one molecule at a time, through a slow, labor-intense process. When Ouyang was beginning his postdoctoral work, new genome-scale technologies for studying RNA had hit the research scene, making it possible to look at RNA structure much more quickly and through a genome-wide lens.
“At that time, people began to develop high-throughput technologies to map RNA structure genome-wide, and the data were difficult to make sense of without new computational methods,” recalls Ouyang. “So I began to develop tools to analyze and derive secondary structure models for thousands of RNAs.”
That pioneering work led him to investigate an intriguing new class of RNA molecules, called long non-coding RNAs (lncRNAs). These RNAs are termed “non-coding” to distinguish them from typical “protein-coding” RNAs, which align with the central dogma and are ultimately made into proteins. (“Long” differentiates them from other, nonconventional RNAs that tend to be shorter in length.)
Exactly how these lncRNAs work, with each other and with other types of molecules, remains poorly understood, but there is early evidence that they can control gene expression. Now, as an independent investigator in his own laboratory at JAX, Ouyang is working to dissect the regulatory roles of these and other non-coding RNAs.
“By looking at their structures, we may have a better understanding of how regulatory elements in lncRNAs are combined together to perform biological functions,” says Ouyang. “In turn, that will lead to a deeper understanding of how they work in biological systems and in human disease.”
These lncRNAs are part of an immense and unexplored frontier. The so-called protein-coding portion of the genome is disarmingly small — about 2% of the human genome represents the genes that code for proteins. What makes up the remaining 98% — and how it works — will occupy the minds of scientists for decades to come.
“There’s so much of the genome that we are still uncovering,” says Lee, “including the RNA regulatory elements that Zhengqing is exploring. Indeed, we have much to learn.”
The power of prediction
One of the strengths of Ouyang’s quantitative approach is the ability to co-opt tools and methods that have been successfully applied to other problems, even those outside of biomedicine.
“In other areas, such as economics and business, there are machine learning methods that have been developed to quantitatively model a system,” explains Ouyang. “For example, when you shop online, your electronic ‘footprints’ are often recorded in a database, and the data science departments of these [e-commerce] entities mine that information and use it to generate models that predict your behavior — such as your likelihood to click on something or even to buy it.”
By developing and applying these models, and many others, to big questions in genomics, Ouyang hopes to realize a bold vision — to unlock the biological basis of how cells in the body work and what predisposes them to disease. And, just as e-commerce organizations seek to predict the actions of online shoppers, Ouyang and his colleagues hope to gather the tools and knowledge to foretell how cells will behave.
Lee thinks about the work this way: “If I was given this one cell, and I analyzed all the RNA in that cell, could I predict what cell type it is, what it is capable of doing, and what its vulnerabilities are?”
Ouyang and his team are certainly setting their sights high, but they are bolstered by the community and resources that surround them.
“Being a part of JAX Genomic Medicine, I’m very fortunate to be working with such esteemed colleagues in computation, technology, genomics, and human disease,” says Ouyang. “It’s such a unique community; I know we’ll accomplish great things.”
Nicole Davis, Ph.D., is a freelance writer and communications consultant specializing in biomedicine and biotechnology. She has worked as a science communications professional for nearly a decade and earned her Ph.D. studying genetics at Harvard University.