If you compare your genome sequence with the reference human genome — the standard template, as it were — it will mostly agree. Nonetheless, when comparing more than 3 billion data points, even differences in a tiny fraction can yield large numbers. In fact, early large-cohort genomics work indicates that most individual genomes differ from the reference at 4.1 million to 5 million different sites, affecting approximately 20 million bases when insertions, deletions and structure variants are included.
The sheer magnitude of the variation makes it extremely difficult to assess which of these differences (or combinations thereof) might affect health and disease in a person. It also makes it imperative to share and compare data across as many individual genomes as possible to find patterns and reveal the variants that truly have clinical relevance.
Heidi Rehm, Ph.D., FACMG, is working on many fronts to make interpretation of genomic variants more accurate and more useful in the clinic. Improving the environment for genomic data sharing (e.g., Global Alliance for Genomics & Health) is one. Aggregating and annotating variant data, resolving variant interpretations and sharing the results with the clinical genomics community (e.g., Clinical Genomic Resource) is another.
At the Human Genome meeting in Barcelona earlier this year, Rehm discussed her work and future goals.
Q: Assessing the clinical significance of genomic variants is a huge effort. What is the current status?
A: We’re working to dig evidence out of the literature and build upon prior work, which is much easier than starting from scratch. We’re also working to identify differences in variant interpretation between labs, which has been a problem without consistent standards and wider data sharing. We need to continue to improve the quality of what’s returned to patients, but it can be hard to find the data.
Q: What’s being done to encourage data sharing and standardization?
A: So far there’s been more of a carrot approach, emphasizing the benefits of volunteering to share data, but there’s some thinking that there can be some requirements built in as well. For example, agencies that accredit clinical labs could make data sharing a requirement for quality assurance. For research, more journals could make open data a requirement for publication. But we still need to develop a better system to provide an easy way for labs to become part of a larger sharing environment.
Q: How would something like that work?
A: The ClinVar database aggregates interpretations of genomic variation. At this time, labs have to manually submit data through a portal, but I’d like to have an API (application development interface) that would allow labs in the network to automatically share data in real time without having to actively submit it. That way ClinVar would never be out of date when an important finding about variant interpretation is made, and labs would find it far easier to participate in the data sharing effort.
Q: How much genome data is enough for variant interpretation? How many people need to be sequenced to learn what we need to know?
A: That’s hard to answer, other than more is better. Obviously, the data we have available to us today is sufficient to answer some questions, and we could answer more with the existing sequence data if data sharing were better coordinated, with more and better genotype-phenotype relationship annotations. And while we can and will answer a lot more questions with more data sets, some would probably remain unanswered even if we had every genome from every person on earth.
Q: What are your plans for the near future?
A: I’ll continue to work with ClinVar, expanding the addition of new variant interpretations and resolving discrepancies. ClinGen is also scaling efforts to define the validity of reported gene-disease relationships. We plan to loop in more experts for the ongoing work in these areas.
A new project involves building a health innovation platform on top of electronic health record (EHR) systems and providing physician support. If we better understand the questions that doctors frequently ask, we can deliver the data that they need, including genetic data. The overall goal is to improve the efficiency, cost and effectiveness of healthcare by facilitating doctors’ decision making at the patient care level.
Mark Wanner followed graduate work in microbiology with more than 25 years of experience in book publishing and scientific writing. His work at The Jackson Laboratory focuses on making complex genetic, genomic and technical information accessible to a variety of audiences. Follow Mark on Twitter at @markgenome.