Much has been written about the proliferation of biomedical research data being generated these days. Genome sequencing data is just the tip of the iceberg by now. Researchers are producing RNA sequencing data, proteomics, single-cell analytics, phenotyping (trait) data, and much more. It’s piling up across work into myriad subjects and conditions (e.g., RNA splicing in cancer, tau spindles in Alzheimer’s disease, pancreatic islet dysfunction in type 2 diabetes, and on and on) employing multiple species, from human patients themselves down to yeast, worms and fruit flies. And, of course, mice.
Unfortunately, each dataset in isolation isn’t able to resolve most of the questions researchers seek to answer, especially when it comes to complex processes, conditions and diseases that require information across many studies. To accelerate progress, it will be necessary to assemble experimental data from the many labs working on particular problems across species. Unfortunately, a variety of barriers exist, from sharing dis-incentives to incompatible data formats, that have limited the ability to fully leverage the modern data avalanche.
GeneWeaver is a system that helps to overcome the barriers by allowing researchers to combine their data and interpret the result using a variety of analysis tools. GeneWeaver integrates and stores experimental genomic data from primary publications, external databases as well as user-submitted datasets, catalyzing researchers’ ability to apply big data approaches and analysis methods across species and data formats. Illustrating GeneWeaver’s power, a research team led by and analyzed aging research data related to several questions in gerontology. In a paper, “Integration of heterogeneous functional genomics data in gerontology research to find genes and pathways underlying aging across species,” published in PLoS One, the researchers identified pathways and novel conserved genes that influence longevity.
The researchers investigated several aspects of aging, among them the pathways that underlie life extension properties of caloric restriction, which has been shown to extend life span in many organisms, and species-specific life extension drugs. Combining the cross-species genes (homologs) identified in genome wide studies with the genes affected by Sirolimus-rapamycin (mouse) and 3,4,4’-trihydroxystilbene resveratrol (fruit fly) yielded a single gene, HSPA1A, common to all pathways. HSPA1A is part of the HSP70 complex, which has previously been implicated in aging processes and longevity. Focus on this particular gene may provide further insight.
The team also integrated data from 73 aging-associated gene sets across six species (yeast, worm, fly, rat, mouse, human) for analysis into highly conserved aging-related genes. The most highly connected gene, Cd63, was present in 12 gene sets across four species, providing strong evidence of a role in aging mechanisms. In collaboration with investigators in , the orthologous gene was knocked down in worms to validate the finding, and a 10.5% extension of mean lifespan was observed. While no documented variants of the gene yielded an observed aging effect in humans, further investigation of the role of Cd63 is indicated.
Integrating and analyzing big data will become increasingly important as more data, including the aging-related functional genomic data analyzed here, is generated and made available. The GeneWeaver database continues to grow in the number and the variety of datasets it contains, and its utility and power is growing as well. Moving forward, advances in data resource aggregation and analytics will enable the research community to readily identify convergent molecular evidence for novel mechanisms of aging and other health and disease-related processes.