Genome sequencing and the next big thing

Tech Corner | October 29, 2018

Charles Lee, Ph.D., D.Sc., FACMG, FRSC

Professor, The Robert Alvine Family Endowed Chair

[email protected]

207-288-6000

The study of structural genomic variation in human biology, evolution and disease.

Meet Charles

It’s difficult to express how much has changed in the field of genome sequencing since 1997.

The most familiar example is the Human Genome Project, which was churning forward at that time, as it did throughout the 1990s. The first human genome sequence would be completed (well, almost—more on that later) early in the next millennium after more than a decade of effort by large research teams and an investment of approximately $3 billion. Today a similar sequencing exercise has become routine, takes less than a day and costs about $1,000, give or take a few dollars.

Genomics is getting to the point where it’s having a lot of practical applications for human health and in clinical settings.

Today’s sequencing power was almost unimaginable to Chia-Lin Wei, Ph.D., in 1997, however, as she began work with fruit flies as a post-doctoral associate. She used a machine known as the 3730, the workhouse sequencer of the time, to churn out sequence data for her research, sometimes hundreds of thousands of nucleotides (the four kinds of DNA bases, A, C, G, and T) a day. But that wasn’t good enough, and Wei has been pushing the envelope of what’s possible ever since.

The right place at the right time

Fast forward to the present day at JAX, where Wei is the Director of Genome Technologies. It is, she says, the perfect place for doing what she does.

“Genomics is getting to the point where it’s having a lot of practical applications for human health and in clinical settings,” she says. “It’s the right time and JAX is the right place for me.”

So what does she do? Simply put, she wants to make her group at JAX a knowledge hub for advanced sequencing technologies at the center of JAX’s research program. Currently there are multiple ways to generate sequencing information from DNA, RNA and chromatin, and each has advantages and disadvantages. Keeping up with technology improvements, fine-tuning protocols that produce the best sequence data, improving efficiency and developing better analysis tools are all vital for driving progress in an already dynamic field. And as capabilities expand, her group is also developing better, more accurate ways to sequence and analyze RNA, epigenetic markers (chemical groups added to DNA and histones), chromatin structure (how the DNA loops and folds in 3D and is packed within the nucleus) and more.

“Scientists want to produce data that’s relevant to their work,” Wei says. “With sequencing, I will always encourage them to keep an open mind and encourage them to try new things. Young technology is unfamiliar and may be under-commercialized, but it also may rapidly emerge as something very important and valuable for research moving forward.”

The long read frontier

Most sequencing to this point has been done on what are known as short read sequencers. The technology has been highly refined over the past decade or so, primarily by a company known as Illumina, and it’s now highly accurate, fast and relatively inexpensive. Also, most of the genomic sequence protocols and analysis pipelines have been set up to generate and manage short read sequence data. Improvements in short read sequencing technology have spearheaded the genomics revolution, and huge amounts of important data have been generated using it.

On the other hand, there are several disadvantages too. Short read sequencers require a lot of work to prepare the DNA just right and can only sequence short segments of DNA, on the order of 250 bases. Divide three billion by 250 and you get a sense of how many pieces there are to the genome puzzle that still needs to be assembled after sequencing. In areas that are highly repetitive—think of thousands of bases in a row that are all GCGCGCGC—it’s impossible to reassemble them correctly.

There are areas of the genome, such as telomeres, the structures at the end of chromosomes, that remain inaccessible to short read sequence protocols. As a result, until recently even the most complete human genome sequences have missed hundreds of millions of bases.

Finally, there are important differences between genomes that don’t change the reassembled sequence. Known as structural variants, these are large pieces of DNA that are added or deleted on one chromosome and not the other, for example, or are reversed in orientation. They are very difficult to detect with short read technology and have only recently been recognized as important contributors to genome variation and disease.

Emerging technologies include improvements to long read sequencing, which provide the capability to sequence much longer DNA segments. One long read technology, developed by Pacific Biosciences, is used at JAX for specific projects. Researchers are also using nanopore sequencing, a promising method that involves threading a DNA molecule through a microscopic nanopore and detecting bases directly. Nanopore reads can be extremely long: scientists achieved the first million base pair read late last year and already doubled it to two million by May 2018.

The ability to get such long reads, along with generating the sequence directly from the DNA molecule without a lot of preliminary preparation, opens up a new world for many research areas. Nanopore technology has already been used effectively to track infectious disease outbreaks in the field (Zika, Ebola etc.), and now Wei wants to expand its applications in the laboratory.

“Detecting structural variants has many applications at JAX, particularly regarding cancer,” she says. “ Charles Lee is a pioneer in that field, Roel Verhaak is studying extra-chromosomal DNA in glioma, Christine Beck is looking at transposons and duplicated sequences, and Ed Liu researches duplications in cancer. There are many relevant research programs, looking at the issue in a variety of ways.”

Chia-lin Wei works at a PACBIO RS II DNA sequencer in the Genome Technologies lab at JAX Genomic Medicine in Connecticut.

Industrial genomics

Wei’s work experience has prepared her well to push technology to the limit. After her post-doc, she was lured away from academia to work for a functional genomics company that was trying to decode all protein coding genes using a sequencing approach. Her skill at implementing what she calls an “industrial genomics operation” caught the eye of a scientist building a genomics institute in Singapore. That scientist, Ed Liu, is now president and CEO of JAX, but at the time he led the Genomics Institute of Singapore (GIS), and he recruited Wei to build a similar operation at GIS.

“Ed wanted to reveal the complexities involved with coding regions in the genome in both normal function and disease,” says Wei, “and to use these discoveries as a foundation for the genomics program. GIS is also where I first worked with Yijun (Ruan, now a professor at JAX), who was looking at genomic 3D structures and chromatin complexity using sequencing as a discovery tool.”

After eight years in Singapore, Wei sought to return to the United States and its educational system for her family. Eddy Rubin, a scientific advisor for GIS, worked at the Director of the Joint Genome Institute (JGI), a U.S. government research institution run by the Department of Energy. Rubin wanted to move beyond genome sequencing to add functional annotation and interrogation, and he needed someone to lead the technical charge for the change. In addition to making a thorough update to JGI’s sequencing equipment, Wei expanded her perspective so that she could anticipate emerging technologies. At the top of her list was nanopore sequencing, an interest she brought with her to JAX in 2016.

Her career has merged science with other skills, and Wei wouldn’t have it any other way.

“The operation of high throughput genomic sequencing operations in past decades was a different type of science,” says Wei. “You needed to build an entire workflow with quality standardization, and work with large-scale automation and robotic instruments. The lab looked more like a warehouse. It’s changed now, but there is still a lot of problem-solving and you need to like to build things.” She laughs. “I must have some hidden engineer traits, but I love it.”

Future benefit

As sequencing begins to play a larger role in medicine, it’s important to note the differences between clinical and research sequencing. In the clinic, accuracy is paramount, and long read output hasn’t yet come close to matching short read data.

Think of it this way: if a sequence run is 99.999 percent accurate, that means one in a hundred thousand bases is wrong. That sounds great until you realize that in a 3.2 billion base genome, you’ll still have more than 30,000 errors in your data for every genome sequenced. And while it’s come a long way in a short time, long read technology is not yet near even 99 percent accuracy.

So how can it be applied now? One important area is rare disease diagnosis.

“Using short reads, we can identify less than 50 percent of disease-causing mutations, even for those caused by a single gene,” says Wei. “That’s because rare diseases often involve changes in non-coding regions or high repeat regions. Or they are caused by structural variants in the genome. All of these are very difficult to detect with short reads. We are now starting to apply long read technology to show how it can resolve these issues.”

As mentioned, another area is cancer, which often has genomic changes and complexities that extend well beyond what can be found in a simple linear sequence. With the new capabilities, “tumor sequencing” can be transformed, providing new insight into therapy targets and ways to prevent recurrence.

Even with so many things on her plate, Wei doesn’t hesitate when asked about her primary current goal.

“I would like to establish JAX as a leader in using long read sequencing for genome and transcriptome [RNA] analysis,” she says. “We need to generate the best long reads possible and develop robust computational pipelines. With these, we can discover what variants are present in cancer and other diseases as well as how they affect disease initiation and progress.”