Unlocking clinical data

Research Highlight | June 15, 2022

Phenotype is a word used all the time by scientists and doctors, but rarely by others. It’s a handy word though, meaning all of the observable and measurable traits of an organism.

Peter Robinson, M.D., MSc.

Professor of Computational Biology

[email protected]

207-288-6000

Develops algorithms and software for the analysis of exome and genome sequences.

Meet Peter

They can be obvious physical traits—a mouse with white fur, for example, or a dog with long, droopy ears—or those only discerned through specific testing, such as cholesterol levels or how much oxygen we can carry to our muscles when we exercise. It's also important for our health, as while we might not know it, everyone has a clinical phenotype. This is a collection of our traits, be they biochemical, physiological, behavioral or morphological (height, weight, etc.), measured and described each time we go to the doctor. And if we become sick, the disease phenotype (what we refer to as symptoms) captures the differences from our healthy baseline.

Collectively, our clinical and disease phenotypes represent a huge amount of valuable information. Combined computationally with our genome sequences and other specialized molecular data, they can provide vital insight into our own biology, our diseases and how to deliver better medical care. Unfortunately, most phenotype data is not accessible for computation, captured in highly variable language and formats and frequently isolated in non-standardized, non-interoperable Electronic Health Record (EHR) systems. Jackson Laboratory (JAX) Professor Peter Robinson , M.D., M.Sc., a leader in the Global Alliance for Genomics and Health (GA4GH) consortium, has spearheaded an effort to make phenotypic data accessible and computable across platforms. The result, called the Phenopacket schema, is now available.

Clinical phenotype analysis made possible

The rapid advent of genomic sequencing in both research and clinical contexts has produced accompanying progress in standardized exchange formats for sequence and variant data, such as the Variant Call Format (VCF). Complementary exchange standards for phenotypic and other clinical data have lagged far behind, however, in part because of the challenges noted previously. In “The GA4GH Phenopacket schema: A computable representation of clinical data for precision medicine,”published in Nature Biotechnology, Robinson and an international team of collaborators present the rationale behind and capabilities of their Phenopacket schema, which isfreely available. Phenopackets streamline exchange and systematic use of phenotypic data and facilitate sophisticated computational analysis of both clinical and genomic information. Providing such a framework for exchanging information about phenotypic traits and abnormalities promises to greatly facilitate precision medicine and precision public health.

At its core, the Phenopacket schema provides a set of rules to organize medically relevant data, and a phenopacket is a structured representation of an individual’s data. The structure is flexible, and almost all elements of a phenopacket are optional. The central element of a phenopacket is thePhenotypicFeature, which is used to describe patient signs and symptoms, laboratory findings, imaging, histopathology findings, and more. Each feature is represented using structured, standardized terms. Other elements includeMeasurement, which are quantitative and categorical measurements (e.g., a platelet count); Biosample, a description of biological material obtained from a patient, such as a tumor biopsy;MedicalAction, which captures medications, procedures and other medical actions taken for clinical management; andInterpretation, which specify interpretations of genomic findings.

Global Alliance for Genomics and Health

The Phenopacket schema is part of a larger GA4GH community effort to develop a suite of coordinated standards to make genomics more useful and more meaningful for healthcare. This phenopacket standard is the result of extensive community feedback and designed to support interoperability between the people, organizations and systems that address human disease and biological understanding on a global level. Increasing the amount and accessibility of computable data across systems supports better disease analysis by integrating genotype, phenotype and other data to further develop precision health. It also provides the foundation for additional software development to facilitate genomic data analysis in the context of clinical information to accelerate innovation and discovery.