Finding the signal in the noise
By Mark Wanner
GeneWeaver is a powerful web-based data and analysis software system designed to find a convergent signal in noisy functional genomics data.
Waves of data. Anomalies and noise. Analysis and excitement and hopes dashed.
Are we talking about the search for extraterrestrial intelligence (SETI)? Or genomics research?
Well, both. It may seem odd to liken the two — call it the search for inner meaning versus outer meaning — but both involve wading through vast amounts of “noisy” data in hopes of finding a signal. And even when a preliminary blip generates excitement (as it did last summer in the SETI community), a lot more data is needed before it can be determined if the so-called signal means anything.
Sadly, despite the occasional anomalies spotted, everything in SETI has turned out be just noise so far. Fortunately for genomics researchers, however, there are real and important biological signals in the data noise. Unfortunately, turning the preliminary blip into something robust and validated can be quite a difficult task. Is that gene expression change (or genotype-phenotype association or disease correlation or …) actually real and relevant, or is it an anomaly in one or a few data sets containing billions of data points?
That’s where GeneWeaver comes in. GeneWeaver is a powerful web-based software system that combines a genomics data repository with powerful analysis tools. It provides biologists with the ability to add their own experimental data to a huge amount of other data—GeneWeaver has more than 130,500 gene sets consisting of nearly 2 million genomic features across several species, including human—to help them find the signal in the data noise. And a new release scheduled for April will greatly improve data storage and curation capabilities.
“Many questions in genomics can be addressed by working with sets of genes,” says , who spearheaded the development of the original GeneWeaver release more than a decade ago. “But different experiments using different methods generate piles of noisy data. How can you combine those data to take advantage of the diverse results to look for a convergent signal that generalizes across conditions?”
Scientists are able to upload their own data to GeneWeaver, share them publicly or among private user groups, and look at them in conjunction with other uploaded data sets, as well as curated data repositories. Looking at different kinds of data across species, it’s possible to see, for example, if changes in gene expression have been connected with disease pathology. A result that implies a gene may function in a certain way or give rise to a measurable trait can be greatly strengthened if similar results are found in other species. And so on.
“Because it collects so much data from different sources, GeneWeaver is really good at ‘guilt by association’ predictions of gene function,” says , who is using GeneWeaver for his research. “For example, we’re looking to see if fertility status serves as a biomarker for overall health. If someone is infertile, does it correlate with other health issues? Using GeneWeaver, we can look at a list of infertility genes in mice and see if they are enriched for genes associated with other diseases, such as cardiovascular disease and cancer. That provides an early biomarker for co-occurring diseases and reveals a general genetic architecture related to infertility. That can be highly useful for then going back and finding new fertility genes as well.”
Genomics data is full of vague signals. A gene may be implicated in contributing to a function, trait or disease, for example, but the association may not be strong, or the data set may simply be too limited to draw conclusions. GeneWeaver quickly provides far more depth and precision to these signals, yielding testable hypotheses to help guide future research.
“It puts biologists in the driver’s seat,” says Chesler, “by letting them work with big genomics data sets in the ways most valuable to their research. They are able to place their own data in context of other experimental results and accumulated knowledge in genomics to ask questions and find signals that would be beyond their capabilities using more limited data sources and tools.”