2018 will mark the 15th anniversary of the publication of the first human genome sequence. Those 3.2 billion letters were so hard-won, and they promised so much just as the millennium began. So where are we now? What do we know and what can we do after a decade and a half of human genetics and genomics inquiry that has brought amazing progress — and perhaps an equal amount of frustration? The present status was recently showcased at the annual American Society of Human Genetics (ASHG) meeting, with tantalizing glimpses at an exciting future.
There’s an expectation at ASHG of seeing something new and shiny that drops jaws and has the potential to change how research is conducted. This year lacked that one big thing though. Instead, there was an expectation that research progress should start translating to clinical impact in more disease areas and applied in more sophisticated ways. In other words, our capabilities are unprecedented and have yielded wonderful biological insight in the research realm, but what can we actually do with all that ability and knowledge in the clinic?
There’s no doubt that there have been some splashy headlines recently about new clinical advances. A procedure using gene editing to correct Hunter syndrome, a metabolic condition, was recently reported, using TALEN and not CRISPR interestingly enough. In October, the FDA approved a gene replacement therapy to cure a rare form of retinal dystrophy that causes blindness. The therapy delivers a functional copy of the gene RPE65 to patients’ eyes via injection of a viral vector, and early testing yielded improved vision in 93% of the patients who received it. And cancer gene therapies are proceeding quite rapidly, with engineering of T cells used for CAR-T therapies (in which a cancer patient’s T cells are removed, engineered to fight their cancer in culture, and then injected back into the patient) leading the way. The treatments are highly focused for relatively rare conditions, however, and they are still in early stages of rollout.
At the meeting, presenters rolled out data and discoveries using established methods and technologies, showcasing how they’ve been refined and matured. As a result, many sessions dealt with next steps that make data more useful, if not yet translatable. For example, sequencing has yielded larger and larger datasets, but they’re often still siloed and/or in proprietary formats, making it extremely difficult to share data and build a sufficiently large N for robust analyses. Efforts such as gnomAD, an accessible database containing 123,000 exome and 15,500 whole genome sequences, have begun to break down the silos. Also, now that so many genomic variations have been identified, systematic efforts to characterize what they actually mean and reduce the vast number of so-called “Variants of Unknown Significance” are well underway. In addition, we’re getting a better handle on what and how to report results to sequenced patients, including secondary findings.
On the other hand, some sessions explored issues that show we’re just starting to get a handle on many other research puzzles. Topics included what the vast amount of non-coding DNA in the genome does, how to handle mosaicism and disease in the single cell analysis era, how to work with the diseases such as type 2 diabetes that involve highly complex genetic and environmental risk factors, how three-dimensional structure influences gene expression, and much more.
There remains a gap between early “power of genetics” claims and the availability of tested and validated gene-based therapies for common conditions. Unfortunately, the gap is starting to be filled by “genetic” consumer products, some of which are highly dubious at best. So when will real genetic products and treatments be available, not just for cancers and rare diseases but also common, complex diseases or even traits? It’s hard to know, but there’s definitely reason to expect a lot of progress in the next several years. Not for tests that promise to help you hone your athletic talent or probiotics that are claimed to improve your mood, mind you, but recent discoveries provide hope for clinical progress.
One difficult hurdle to overcome in genetic therapies is delivery. It’s great to be able to edit genetic sequences, but how can you get the editing machinery into trillions of cells to actually do it? CRISPR has indeed yielded treatments that are nearing clinical testing, but it’s no surprise that the one closest to fruition is another one delivered to the retina, a small and relatively accessible tissue for viral vector delivery. But what if you could fend off disease through genetic means without changing the sequence? For example, recent work showed that dysfunction in aging immune cells is likely caused by chromatin access — that is, some genes become more accessible and others less, changing which get expressed. If the processes were halted and the chromatin patterns remained “young,” perhaps immune responses could be maintained with no sequence changes needed.
Another area of great interest and promise is RNA biology, which has proven to be far more important and powerful than understood for many decades. The ability to supply manufactured messenger RNA (mRNA) that would code for the desired protein or, for infectious diseases, antibody, would be another way to avoid the delivery hurdle. Unfortunately, foreign mRNAs provoke an immune response that, in addition to blocking any therapeutic effect, can be harmful, but recent advances in how mRNAs are constructed are addressing the issue. In fact, mRNA therapy is entering early testing phase for cancer vaccines based on tumor sequencing, an effort to watch closely over the next couple of years.
Finally, there are large-scale projects that are beginning to assemble the huge amounts of data that the new technologies are allowing us to obtain from patients. It needs to be systematically collected — medical history and physical traits as well as sequence data — to derive the greatest insight and potential benefit. One such resource, the U.K. biobank, went live with an accessible 500,000-person data set earlier this year. The biobank in its current forms has limitations in that its subjects are only genotyped at this time, with sequencing just getting underway, but it has already yielded some important findings. Closer to home, the Geisinger Health System’s MyCode community health initiative, which has enrolled more than 150,000 volunteers to this point, is matching exome sequences with health data within its patient population in Pennsylvania. The findings from roughly the first 50,000 enrollees have provided robust support for this kind of precision medicine program. Interestingly, 3.5% of the participants were found to have a clinically actionable genetic variant, and there have been many instances of early disease diagnosis in patients who presented as healthy. Geisinger just announced that it’s looking to expand its MyCode methodologies to a national scale.
In the end, the lack of dramatic new research tools may not be a bad thing. With the current toolbox being refined and applied to more and more samples in a wide range of research areas, significant findings are emerging that can and will translate to the clinic. Our genetic sequences alone may not tell us as much as people hoped in 2003, but combined with everything else we now know, they are still likely to underpin a remarkable era of medical progress.
It’s interesting to look back over the past decade and half to see how we’ve arrived at our current situation. The human genome project was a mammoth effort that spanned more than a decade, and paying for it was no small feat. To help sell the $3 billion-ish price tag to skeptical constituencies, the scientists involved and their supporters made some sweeping statements about what completing the sequence would mean. It’s interesting to read some of them now — famously, President Bill Clinton said it would “revolutionize the diagnosis, prevention and treatment of most, if not all, human diseases” upon release of the first draft of the sequence in 2000. The “blueprint of life” held the keys to our health and well-being, it was thought, and it would usher in a new era of medicine. It was a compelling message, and, with sequence in finally hand, the anticipation of this new era began to build.
Sure enough, the science crackled along year after year, and in retrospect the field of human genetics expanded with amazing rapidity. On the other hand, clinical genetics remained enticing but theoretical. There was initially an eager search for common genetic variants that would differentiate between health and disease, and it found disease associations by the dozens. A review paper in 2009, six years post-sequence, documented more than 350 genome-wide association studies (GWAS) that identified more than 1,600 single nucleotide polymorphisms (SNPs, where a single base differs between individuals) associated with more than 200 disease traits. The GWAS studies represented a step forward, but as time went by, it became painfully apparent that the step wasn’t as significant as originally hoped.
Why did the early GWAS fall short of expectations? While they did identify some important disease-gene relationships, they revealed little to nothing about underlying mechanisms. Also, SNPs tell a very small piece of the puzzle of human genetic variation — more on that later. Finally, GWAS sometimes gathered genotype data from thousands of patients, but they were limited to hundreds of thousands of data points per sample. It became clear that there’s a lot going on in the other 3+ billion bases, and to figure out what, you’d need to get sequence data from a lot more people.
When the review paper above was published, it still cost hundreds of thousands of dollars to sequence an entire human genome. Not surprisingly, the number of sequences available was tiny — it wasn’t until a year later that the actress Glenn Close became the first woman to announce she had been sequenced. Then sequencing costs went into a dizzying free fall for the next five years. The magical figure of the $1,000 genome suddenly didn’t seem so far-fetched, and the number of human exome (where just the protein-coding regions—about 1.5%—of the genome is sequenced) and genome sequences skyrocketed
It has also become possible to manipulate the genome with far greater ease and speed than previously possible, thanks to CRISPR/Cas9. Those studying an obscure bacterial anti-viral system first saw its potential applications for genetic editing a relatively short time ago, and it has gone from nearly unheard of to ubiquitous in biomedical research in a space of five years. In the process it has spurred some serious ethics debates — technically, we can edit human embryos, with all sorts of implications—and moved biomedical research in new, previously inconceivable, directions.
Armed with these formidable tools and many others in computing, microscopy, long-read sequencing, single-cell analyses, microbiome studies and more, researchers have jumped into the human genetic maelstrom with great vigor. They’ve learned a tremendous amount, and there have been clinical strides, especially in oncology, where the ability to thoroughly investigate tumor genetics has led to important medical insights. In many areas, however, much of what they’ve learned is that every time they answer a question, it seems they find another layer of complexity.
The 1,000 Genomes Project was up to 2,504 genomes when they published a reference for human genetic variation two years ago, and the numbers the project members presented were staggering. In those genomes, they’d found a total of 88 million variants. Most, 84.7 million, were in fact SNPs, but they also found 3.6 million short segments inserted in or deleted from the genome. There were also about 60,000 even larger anomalies, called structural variants, where extensive sections are duplicated, deleted, inserted or inverted.
So, the “typical” genome differs substantially from the genome sequence published as a reference in 2003 (which, for what it’s worth, was mostly derived from the genome of a random person from the Buffalo, NY, area). The atlas authors estimated that, on average, any two genomes from healthy individuals will have variations in 4-5 million places, including 2,000-2,500 structural variants that alone affect about 20 million bases of sequence. With statistics like that, it’s somewhat surprising that most of us function physiologically as well as we do. And it’s no wonder that figuring out which variants — or combinations thereof — make us more or less susceptible disease is an extremely difficult task.
It’s now widely accepted that to figure out what’s really going on, we’ll need more than a few thousand or tens of thousands of genomes. We need many more, perhaps tens of millions. And we need more than just sequences. We need sequence data linked with medical data linked with baseline trait data linked with environmental data. And we need to move beyond sequences when investigating the genomes themselves. After all, our genomes are not linear strings of letters in our nuclei — they are coiled up and arranged in three-dimensional space in ways that can be very important for function and that researchers are just beginning to explore. And there are epigenetic markers, chemical tags that are added to and removed from DNA that can greatly affect function. And RNA has been found to do far more in cells than was even suspected a decade ago. And on and on. So how do we get our arms around all this complexity? Researchers are working hard on that issue right now.