The evolution of high-throughput genome sequencing

The Search Magazine Article | December 10, 2015

Charles Lee, Ph.D., D.Sc., FACMG, FRSC

Professor, The Robert Alvine Family Endowed Chair

[email protected]

207-288-6000

The study of structural genomic variation in human biology, evolution and disease.

Meet Charles

The tools for reading — or “sequencing” — the chemical letters that make up our DNA have evolved rapidly over the last decade. Researchers can now gather information more quickly and at a lower cost than ever before. In fact, sequencing costs have been on a veritable free fall for the last several years, even outpacing a well-known trend in the computer hardware industry, called Moore’s Law, in which computing power increases (and costs drop) two-fold every two years. (Check out these graphs, courtesy of the National Human Genome Research Institute, which plot the decline in DNA sequencing costs relative to Moore’s Law.) Although Moore’s Law wasn’t conceived with biotech in mind, it has become a common benchmark for measuring technology performance and growth.

What’s behind this rapid change? Well, in short, entirely new ways of reading DNA that diverge from the standard “first-generation” approach, known as the Sanger method. Named for its inventor, Frederick Sanger, this kind of sequencing was the scientific workhorse of the Human Genome Project (HGP), a sweeping, international effort to decode the full human genetic blueprint, which culminated with the publication of an initial draft genome sequence in 2001.

In order to determine the order of chemical letters (or “bases,” abbreviated as A, C, G, and T) that make up the genome, the Sanger method creates new copies of the DNA target of interest. The raw material for these copies comes from bases that carry special modifications, such as fluorescent tags that glow a different color depending on the type of base — for example, green for A, red for T, and so on. As these modified bases are incorporated into the newly synthesized DNA strand, it becomes possible to decipher the sequence. (Watch this short video, which explains the basic idea behind Sanger sequencing.)

Fueled in part by the HGP, scientists continually tweaked the Sanger method, improving its performance, automating it, and generating ever-longer strings of DNA sequence (called “reads”). While this enhanced the genomic bang-for-the-buck, eventually the technology hit a wall: The typical Sanger sequencing reads tend to top out at 800 to 1000 bases. With a length of roughly 3 billion bases, the first full human genome sequence took about ten years and nearly $3 billion to complete.

If sequencing whole genomes was to become more commonplace (and take less time and money), an entirely new approach was needed.

Enter so-called “next-generation” sequencing (NGS), which relies on different kinds of chemistry than Sanger sequencing. While there are multiple NGS methods that each differ in the nitty-gritty details, they all share a handful of key properties. First, instead of aiming to maximize read length, these methods all yield fairly short bits of DNA sequence, from as short as 50 bases to a few hundred bases. Second, next-gen technologies essentially miniaturize the sequencing process — decoding a piece of DNA within a very, very tiny space, allowing many other pieces of DNA to be sequenced simultaneously. (This is why NGS is often called massively parallel sequencing.)

There are even third-generation technologies that take yet another approach — passing a single molecule of DNA through a tiny opening (a so-called “nanopore”) and determining the identity of each base as it passes through the pore by virtue of a change in electrical activity.

The combined effect of these new technologies is that sequencing costs have fallen dramatically. And driving down costs means scientists get more DNA sequence for their laboratory dollar. Now, it is economically feasible to sequence not one genome, but hundreds of them, even many thousands.

This evolution has ushered in a new era of biomedicine, in which it is possible to probe the human genome on an individual level, revealing variations in one person’s DNA that may have significance for understanding disease biology and could even guide treatment. Countless laboratories here at JAX are harnessing these new capabilities. For example, Charles Lee, professor and scientific director at The Jackson Laboratory for Genomic Medicine, is a world leader in applying sequencing and other genome-scale technologies to reveal how individual genomes vary from one another. He is part of an international consortium that just reported the completion of a major project, the 1000 Genomes Project, which sequenced the genomes of over 2,000 people from 26 populations across the globe. Lee also led a recent study that used DNA sequencing to explore the tumors of more than 100 people with gastric cancer, unveiling mutations in genes that could prove to be key drug targets in this form of cancer. These discoveries, and many others, underscore the power of second- and third-generation DNA sequencing technologies to push the frontiers of knowledge in biology and medicine.

Nicole Davis, Ph.D., is a freelance writer and communications consultant specializing in biomedicine and biotechnology. She has worked as a science communications professional for nearly a decade and earned her Ph.D. studying genetics at Harvard University.