Scientific Applications

Iteratively Adjusted-Surrogate Variable Analysis (IA-SVA):

High-throughput sequencing data typically harbor unwanted variation from diverse sources. Existing statistical methods for parsing the sources of unwanted variation assume that these multiple sources are uncorrelated with each other, an assumption that is frequently not met in sequencing data due to poor experimental design or technical limitations. We present a statistical framework to uncover hidden sources of variation even when these sources are correlated, namely Iteratively Adjusted Surrogate Variable Analysis (IA-SVA). IA-SVA provides a flexible methodology to i) identify a hidden factor for unwanted heterogeneity while adjusting for all known factors; ii) test the significance of the putative hidden factor for explaining the variation in the data; iii) adjust the data for the detected factor if the factor is significant; and iv) iterate the procedure to uncover further potentially correlated hidden factors. Using simulated and real-world RNA-Seq data, we studied the efficacy of IA-SVA for uncovering sources of unwanted variation in bulk and single-cell transcriptomic data and compared against existing supervised (i.e., methods based on a control set of genes) and unsupervised methods. IA-SVA outperformed alternative methods in terms of statistical power, Type I error rate, and accuracy in detecting/estimating the hidden factors and proved to be an effective method in the absence of a negative control set. As a case study, we applied IA-SVA to uncover variation in single cell RNA-Seq data from human islets and showed that our method can capture cell types within a cell composition with high accuracy and detect variation that only affects only a subset of alpha cells due to the high expression of a small number of genes. An R package for IA-SVA with example case scenarios is freely available from https://github.com/UcarLab/IA-SVA/. For more information contact donghyung.lee@jax.org 

CIVET:

Civet is a framework for developing command line analysis pipelines that execute through the Torque batch system used on many High Performance Computing (HPC) systems. A Civet pipeline is defined by an XML file that describes the files operated on by the pipeline and the steps that produce or consume files. Each tool that may be invoked by the pipeline is defined by its own XML file. These tool definitions may be shared between pipelines, allowing multiple pipelines to make use of a common set of tools. Civet has been deployed to Amazon AWS using Amazon's Cfncluster tool as a proof of concept, and work is currently underway to develop a native Google Cloud Platform implementation. Civet is open source and can be found on Github: https://github.com/TheJacksonLaboratory/civet. For more information contact glen.beane@jax.org

CKB:

JAX-CKB is a powerful tool for interpreting complex genomic profiles and represents a valuable resource for clinicians and translational and clinical researchers. JAX-CKB advances JAX's mission to discover genomic solutions for disease and empower the global biomedical community in the shared quest to improve human health. The public version of CKB is available at https://ckb.jax.org/ has curated content updated on a daily basis. In addition to the publicly available CKB, Computational Sciences has built a number of applications and services to streamline creating and maintaining the content within CKB.  For more information contact daniel.durkin@jax.org.