Tools, Data & Databases

This site lists tools and resources related to data science that have been developed by JAX faculty and staff or that support research collaborations with JAX investigators. No animal resources are listed here. For a listing that includes animal resources, visit the Research Tools and Resources page.

Name	Description

aln Tools Access Download Associated Laboratories/Individuals The Churchill Lab Contact [email protected]	Processes NGS alignments into a sparse compressed incidence matrix. Stores pre-defined binary format for efficient downstream analyses and storage.
AMP-AD Knowledge Portal Access Online & Download Associated Laboratories/Individuals The Howell Lab The Carter Lab Contact [email protected]	The AMP-AD Knowledge Portal is a platform to access data, analytical results and tools generated within the National Institute of Aging’s AD Translational Research Program. The majority of the content in the Portal is genomic data generated from human samples or experimental model systems. The Portal also contains analytical results and data summaries. Read less Read more… The AMP-AD Knowledge Portal is a platform to access data, analytical results and tools generated within the National Institute of Aging’s AD Translational Research Program. All projects within this program operate as an open science collaboration and openly share resources early in the research life cycle for evaluation and reuse. The Portal is named for the first consortium to be initiated, the Accelerating Medicines Partnership in Alzheimer’s Disease Target Discovery and Preclinical Validation Project (AMP-AD). The majority of the content in the Portal is genomic data generated from human samples or experimental model systems, including the MODEL-AD Consortium. In addition to this, the Portal also contains bioinformatic analytical results including some generated by teams within individual projects and some generated through multi-team working groups. Bioinformaticians can download these resources for use in their own research. You will need a Synapse account to access the data in the AMP AD Knowledge Portal. The AMP-AD Knowledge Portal is funded by the National Institute on Aging. It is developed and maintained by Sage Bionetworks.
CloudNeo Access Online & Download Associated Laboratories/Individuals The Chuang Lab Contact [email protected]	CloudNeo is a cloud-based computational workflow for identifying patient-specific tumor neoantigens from next generation sequencing data.
Cre Portal Contact [email protected]	The Cre Portal contains curated data about all recombinase-containing transgenes and knock-ins developed in mice to provide a comprehensive resource delineating known recombinase activity patterns and allowing users to find relevant mouse resources for their studies. Read less Read more… Conditional mutagenesis is a powerful technique that allows studies of gene function where knockout homozygotes are lethal or where the mutation affects multiple systems. This technique is dependent on activity of a recombinase in a tissue or life stage of interest. Information on cre activity and specificity in mice that express recombinases is collected in the Cre Portal. Mouse resources expressing recombinases in a spatial or temporal manner can be queried by tissue or by driver of recombinase expression. Autocomplete of these search boxes will show all available terms; terms in bold indicate data is available for that term. Downloadable reports, data metrics and links to related queries are also available. Data are also available at www.mousemine.org via custom queries and API.
Digital Pathology Pipeline Associated Laboratories/Individuals The Korstanje Lab Ron Korstanje, Ph.D.\|Professor, Evnin Family Chair	Tools for automatic glomerulus identification and histological quantification from scanned PAS slides
EMASE Access Download Associated Laboratories/Individuals The Churchill Lab Contact [email protected]	An expectation maximization algorithm for allele specific expression. Primary author K. Choi of the Churchill Lab. Read less Read more… Full description: Hierarchical analysis of RNA-seq reads improves the accuracy of allele-specific expression
FusorSV Associated Laboratories/Individuals The Lee Lab Contact [email protected]	FusorSV is a data mining-based framework that allows for comprehensive and robust detection of Structural Variations (SV) from next generation sequencing datasets. We built SV engine (SVE) that includes all tools including fusorSV that can be used for analysis of new datasets. SVE also includes data models built using 1000 Genomes SV callsets as ground truth. Read less Read more… Comprehensive and accurate Structural Variation (SV) discovery from next generation sequencing data remains a major challenge. Popular approaches to overcome performance limitations of existing SV-calling algorithms are to use complementary algorithms to determine the SV loci and then merge them under a heuristic manner. However, such approaches do not consider the strengths and weaknesses of individual algorithms and either under or over merge the variant loci resulting in false SV calls. Here, we present FusorSV, an open source tool that uses a data mining approach to assess performance and merge callsets from an ensemble of SV-calling algorithms. We developed a FusorSV SV calling model using an ensemble of eight SV-calling algorithms for the analysis of 27 deep-coverage human genomes from the 1000 Genomes Project (1000GP). For an easy-to-use SV detection pipeline, we also built Structural Variation Engine (SVE) that is capable of performing gold standard SV analysis for whole genome sequencing projects.
g2g Tools Access Download Associated Laboratories/Individuals The Churchill Lab Contact [email protected]	Genome Editing tools. Creates custom genomes by incorporating (phased) SNPs and indels into reference genome, extracts regions of interest, e.g., exons or transcripts, from custom genomes, and converts coordinates of files (bam, gtf, bed) between two genomes.
gbrs Access Download Associated Laboratories/Individuals The Churchill Lab Contact [email protected]	Genotype-free genome reconstruction and ASE quantification. Read less Read more… Example use case deducing potential sample mixups by comparison of GigaMUGA haplotype reconstructions to haplotypes deduced from islet RNA-seq-based genotype-by-sequencing method (mentioned in Chick/Munger et al. Defining the consequences of genetic variation on a proteome–wide scale, 2016 PMID: 5292866).
Gene Expression Database (GXD) Access Online Contact [email protected]	Gene Expression Database (GXD) is a database project that integrates different types of gene expression information from the mouse and provides a searchable index of published experiments on endogenous gene expression during development. GXD as a core component of the Mouse Genome Informatics (MGI) resource which is the international community database resource for the laboratory mouse, providing integrated genetic, genomic, and biological data to facilitate the study of human health and disease. Read less Read more… GXD stores primary data from different types of expression assays. By integrating these data, GXD provides, as data accumulate, increasingly complete information about the expression profiles of transcripts and proteins in different mouse strains and mutants. GXD also describes expression patterns using an extensive, hierarchically-structured dictionary of anatomical terms. In this way, expression results from assays with differing spatial resolution are recorded in a standardized and integrated manner and expression patterns can be queried at different levels of detail. GXD places the gene expression data in the larger biological context by establishing and maintaining interconnections with many other resources. Integration with MGD enables a combined analysis of genotype, sequence, expression, and phenotype data. Links to PubMed, Online Mendelian Inheritance in Man (OMIM), sequence databases, and databases from other species further enhance the utility of GXD.
Gene Ontology Consortium	The Gene Ontology (GO) project is a collaborative effort to address the need for consistent descriptions of gene products across databases.
GeneWeaver Access Online Associated Laboratories/Individuals The Chesler Lab Contact [email protected]	GeneWeaver is a platform for the integrative analysis of heterogeneous functional genomics data. It allows users to compare and contrast biological functions across multiple species based on the genes, gene products and variants associated with these functions in global functional genomics analyses. Read less Read more… GeneWeaver is a database and suite of tools for the discovery of convergent relations among heterogeneous functional genomics studies of biological processes and disease related functions. It consists of a collection of integrated micro-services for the aggregation and integration of multi-species functional genomics data from curated resources, published experiments and aggregated evidence from genomic data resources. Its analysis services are built on statistical scalable combinatorial algorithms to allow users to develop custom workflows to perform set operations to compare and contrast diverse data sets. The system is accessed either through a user-friendly browser interface or an API. All analyses are repeatable on the original instance of user-selected data. A curation interface allows users to manage the storage of gene sets from literature and to perform annotation of gene sets to controlled vocabularies and ontologies.
Glaucoma Discovery Platform Access Online Associated Laboratories/Individuals The Howell Lab Contact [email protected]	Visualize and interrogate gene expression changes in glaucoma Read less Read more… The Glaucoma Discovery Platform allows a user to visualize and interrogate gene expression changes in glaucoma, based on a study done in the lab of Simon John at the Jackson Laboratory. At the time of development existing resources such as GEO and ArrayExpress were not suitable for interrogating and visualizing such as complex dataset. Therefore, to maximize the benefit of this study for us, and the wider scientific community, we developed Glaucoma Discovery Platform, a freely available web-based environment. Glaucoma Discovery Platform was developed using a suite of scripts we term Datgan. To our knowledge, no other resource was available that provided this combination of user-friendly functionality.
Human Phenotype Ontology Access Online & Download Associated Laboratories/Individuals The Robinson Lab Contact [email protected]	The Human Phenotype Ontology (HPO) provides a standardized vocabulary of phenotypic abnormalities encountered in human disease. Read less Read more… The Human Phenotype Ontology (HPO) provides a standardized vocabulary of phenotypic abnormalities encountered in human disease. Each term in the HPO describes a phenotypic abnormality, such as Atrial septal defect. The HPO is currently being developed using the medical literature, Orphanet, DECIPHER, and OMIM. HPO currently contains over 13,000 terms and over 156,000 annotations to hereditary diseases. The HPO project and others have developed software for phenotype-driven differential diagnostics, genomic diagnostics, and translational research. The HPO is a flagship product of the Monarch Initiative, an NIH-supported international consortium dedicated to semantic integration of biomedical and model organism data with the ultimate goal of improving biomedical research. The HPO, as a part of the Monarch Initiative, is a central component of one of the 13 driver projects in the Global Alliance for Genomics and Health (GA4GH) strategic roadmap.
I-ATAC Associated Laboratories/Individuals The Ucar Lab	ATAC-seq is a new protocol to capture open chromatin sites by performing adaptor ligation and fragmentation of open chromatin regions. Due to its efficiency in requirement of biological sample and in library preparation time, many scientists are generating ATAC-seq libraries to decipher the chromatin landscape of DNA in a given cell type and condition of interest.
ImageEchelon Access Online & Download Associated Laboratories/Individuals The Burgess Lab Contact [email protected]	Tool to quantify images where meaningful differences are discernible by eye, but difficult to quantify using traditional methods. Read less Read more… Image Echelon is a tool to quantify images where meaningful differences are discernible by eye, but difficult to quantify using traditional methods. It was developed to quantify neuronal fasciculation in microscopy images, but can be used to rank images based on any qualitative criteria. Classical methods ask observers to score an image in isolation along a scale (e.g., 1-5), which can be difficult to control between observers, especially in a highly variable data set. Image Echelon asks observers to compare two images and pick a “winner” and a “loser” along some criteria, an easier and more reliable task.
Intermediate Access Download Associated Laboratories/Individuals The Churchill Lab Contact [email protected]	An R package for eQTL/pQTL mediation analysis. Read less Read more… Example use case deducing potential sample mixups by comparison of GigaMUGA haplotype reconstructions to haplotypes deduced from islet RNA-seq-based genotype-by-sequencing method (mentioned in Chick/Munger et al. Defining the consequences of genetic variation on a proteome–wide scale, 2016 PMID: 5292866).
International Mouse Strain Resource (IMSR) Access Online Contact [email protected]	The International Mouse Strain Resource offers users a combined catalog of worldwide mouse resources (live, cryopreserved, and embryonic stem cells), with direct access to repository sites holding those resources of interest. Read less Read more… The International Mouse Strain Resource (IMSR) provides an online searchable web-based catalog of mouse resources available globally, including inbred, mutant, and genetically engineered mice, cryopreserved embryos and gametes, and ES cell lines. The IMSR website provides, for each strain or cell line, links for ordering, links to the repositories’ strain description, and links to phenotype and disease model data at Mouse Genome Informatics (MGI). Searches can be performed using one or many parameters, including the strain/stock designation, the strain repository or MGI ID, the state in which the strain/resource is maintained and the strain type. Genetic search parameters include the symbol or name of the phenotypic allele or gene of interest carried in the strain, and repository parameters include the name of one or more specific repositories, or the selection of all repositories in a geographical regional location.
Model-AD Consortium	Model organism development and evaluation for late-onset Alzheimer's Disease. Read less Read more… The MODEL-AD consortium, consisting of a Center at Indiana University, The Jackson Laboratory, and Sage Bionetworks and a Center at the University of California Irvine, has been established by the National Institute on Aging to: Develop the next generation of in vivo AD models based on human data Institute a standardized and rigorous process for characterization of animal models Align the pathophysiological features of AD models with corresponding stages of clinical disease using translatable biomarkers Establish guidelines for rigorous preclinical testing in animal models Ensure rapid availability of animal models, protocols and validation data to all researchers for preclinical drug development
Mouse Genome Database (MGD) Access Online Contact [email protected]	Mouse Genome Database (MGD) is a core knowledgebase for the laboratory mouse and is focused on providing integrated genetic, genomic, and biological data to facilitate the study of human health and disease. MGD is a primary component of the Mouse Genome Informatics (MGI) Consortium. Read less Read more… The Mouse Genome Database (MGD) is the international community mouse database which supports basic, translational and computational research by providing integrated data on the genetics, genomics, and biology of the laboratory mouse. MGD serves as the source for biological reference data sets related to mouse genes, gene functions, phenotypes and disease models with an increasing emphasis on the association of these data to human biology and disease. MGD is the authoritative source of mouse gene and strain nomenclature as well as annotations for mouse gene function, phenotypes and human disease models.
Mouse Models of Human Cancer Database (MMHCdb) Access Online Associated Laboratories/Individuals The Bult Lab Contact [email protected]	The Mouse Models of Human Cancer Database (formerly, the Mouse Tumor Biology database) integrates data on the frequency, incidence, genetics, and pathology of neoplastic disorders, emphasizing data on tumors that develop characteristically in different genetically defined strains of mice. Read less Read more… The MMHCdb database supports the use of a mouse model system for human cancer by providing a comprehensive resource for data and information on various tumor models. The database provides access to information on and data for: spontaneous and induced tumors in mice, genetically defined mice (inbred, hybrid, mutant, and genetically engineered strains of mice) in which tumors arise, genetic factors associated with tumor susceptibility in mice, somatic genetic-mutations observed in tumors, and Patient Derived Xenograft (PDX) models.
Mouse Mutant Resource Database Access Online Contact [email protected]	The MMR mouse variation database provides access to all genetic variants called from high-throughput exome and whole genome sequencing from mice exhibiting spontaneously arising Mendelian disease phenotypes. Phenotype, genetic mapping, and variant frequency metrics are also provided. Aggregated data analyses provide mutation candidate prioritization. Read less Read more… We developed the Mouse Mutant Resource Database (https://mmrdb.jax.org) to host annotated variant calls and sample metadata and to facilitate data sorting, filtering, querying, and sharing. The database employs an algorithm for variant prioritization. The algorithm makes the following assumptions about causative variants: they will be rare (<3%) in the database, the allele ratio of the variant in the sample will fall within expectations for the sample genotype (>0.9 homozygous; 0.2–0.8 heterozygous), and the chromosomal position of the variant will be in agreement with chromosomal linkage data. We optimized the algorithm iteratively by reanalyzing exome data sets with previously confirmed, known mutations.
Mouse Phenome Database Associated Laboratories/Individuals The Chesler Lab Georgi Kolishovski, M.S. \|Scientific Software Engineer	This resource is a collaborative standardized collection of measured data on laboratory mouse strains and populations. Includes baseline phenotype data sets as well as studies of drug, diet, disease and aging effect. Also includes protocols, projects and publications, and SNP, variation and gene expression studies.
MouseMine Access Online Contact [email protected]	MouseMine is a powerful data warehouse providing comprehensive API (application programming interface) access to MGI data, as well as a forms-based user interface. Read less Read more… MouseMine is a powerful data warehouse providing comprehensive API (application programming interface) access to MGI data, as well as a forms-based user interface. MouseMine contains core data from MGI, including the mouse genome feature and allele catalogs, disease and phenotype annotations, expression data, publications, etc. Users can select from predefined query templates or construct custom queries and reports, iteratively refine searches, and save/reuse lists of results.
Multiple Genome Viewer Access Online Contact [email protected]	Explore and compare multiple annotated mouse genomes. Read less Read more… The Multiple Genome Viewer (MGV) allows you to explore and compare chromosomal regions and synteny blocks between the C57BL/6J reference genome and 18 other mouse inbred strains: 16 sequenced and annotated by the Wellcome Institute Sanger Mouse Genomes Project and two (CAROLI/EiJ and PAHARI/EiJ) published by Paul Flicek and Duncan Odom and others, as well as the C57BL/6J reference genome.
PDX Development and Trial Centers Research Network Access Online Associated Laboratories/Individuals The Chuang Lab Jeffrey Chuang, Ph.D.\|Professor Contact [email protected]	PDXNet is an NCI-sponsored consortium that uses patient-derived xenografts to accelerate translational research for the broader research community. The Chuang lab has been co-leader of the Data Coordination Center for this consortium since 2017.
PDX Finder Access Online Associated Laboratories/Individuals The Chuang Lab Jeffrey Chuang, Ph.D.\|Professor Contact [email protected]	PDX Finder is an open global cancer research portal to patient derived xenograft (PDX) models. Read less Read more… Patient-derived tumor xenograft (PDX) mouse models are a versatile oncology research platform for studying tumor biology and for testing chemotherapeutic approaches tailored to genomic characteristics of individual patient’s tumors. PDX models are generated and distributed by a diverse group of academic labs, multi-institution consortia, and contract research organizations. The distributed nature of PDX repositories and the use of different metadata standards presents a significant challenge to finding PDX models relevant to specific cancer research questions. The Jackson Laboratory and EMBL-EBI are addressing these challenges by co-developing PDX Finder, a comprehensive open global catalog of PDX models and their associated datasets. Within PDX Finder, model attributes are harmonized and integrated using a previously developed community minimal information standard to support consistent searching across the originating resources. Links to repositories are provided from the PDX Finder search results to facilitate model acquisition and/or collaboration.
QTL Viewer Access Online & Download Associated Laboratories/Individuals The Churchill Lab Contact [email protected]	Interactive web-based analysis tool that will allow users to replicate analyses reported for a study. Read less Read more… QTL Viewer is an interactive web-based analysis tool that will allow users to replicate the analyses reported for a study (For example the viewer at http://churchill-lab.jax.org/qtl/islet/DO378 represents the data published in the paper Keller, et al. Genetic Drivers of Pancreatic Islet Function, PMID 29567659). It includes the ability to search various subsets of data from a study such as phenotypes or expression data and then visualize data with profile, correlation, LOD, effect, mediation and SNP association plots.
QuIN Associated Laboratories/Individuals The Ucar Lab	QuIN (Query tool for Interaction Networks, available at quin.jax.org) is a tool for visualizing, annotating, and querying chromatin interactions derived from technologies such as ChIA-PET or HiC.
Random Circuit Perturbation (sRACIPE) Access Download Contact [email protected]	sRACIPE is a systems-biology modeling method which takes the gene regulatory circuit topology as the only input, and simulates an ensemble of models with random kinetic parameters at multiple noise levels. Statistical analysis of the generated gene expressions reveals the basin of attraction and stability of various phenotypic states and their changes associated with intrinsic and extrinsic noises yielding new insights on the structure and function of gene regulatory networks.
RFR-PEL Associated Laboratories/Individuals The Ucar Lab	Random Forest Regression for Epigenetic Length prediction
SARNAclust Access Download Associated Laboratories/Individuals The Chuang Lab Contact [email protected]	SARNAclust is a novel semi-automatic algorithm to identify RNA-protein binding motifs from immunoprecipitation data. Read less Read more… SARNAclust is the first unsupervised method to identify and deconvolve multiple RNA sequence/structure motifs simultaneously. SARNAclust computes similarity between sequence/structure objects using a graph kernel, providing the ability to isolate the impact of specific features through the bulge graph formalism. For full details see: https://doi.org/10.1371/journal.pcbi.1006078 Dotu I, Adamson SI, Coleman B, Fournier C, Ricart-Altimiras E, Eyras E, et al. (2018) SARNAclust: Semi-automatic detection of RNA protein binding motifs from immunoprecipitation data. PLoS Comput Biol 14(3): e1006078.
WORMHOLE Access Online & Download Associated Laboratories/Individuals The Korstanje Lab Ron Korstanje, Ph.D.\|Professor, Evnin Family Chair Contact The Korstanje Lab	The WORM Human OrthoLogy Explorer is a meta-tool that uses machine learning to predict novel least diverged orthologs (LDOs) by integrating ortholog predictions from 17 algorithms. Read less Read more… Support vector machine (SVM) classifiers are trained to distinguish whether a gene is or is not an LDO by comparing the predictions of the consituent algorithms across a set of high-confidence examples of known LDOs (the PANTHER LDOs). Originally conceived to predict orthologs between humans and worms, the scope was later expanded to include five commonly used eukaryotic model organisms: humans (Homo sapiens), mice (Mus musculus), zebrafish (Danio rerio), fruit flies (Drosophila melanogaster), and nematodes (Caenorhabditis elegans). The WORMHOLE SVMs are used to calculate LDO confidence scores (aka WORMHOLE Scores) for genome-wide gene pairs between combination of species.

Tools, Data & Databases

Access

Associated Laboratories/Individuals

Contact

Access

Associated Laboratories/Individuals

Contact

Access

Associated Laboratories/Individuals

Contact

Contact

Associated Laboratories/Individuals

Access

Associated Laboratories/Individuals

Contact

Associated Laboratories/Individuals

Contact

Access

Associated Laboratories/Individuals

Contact

Access

Associated Laboratories/Individuals

Contact

Access

Contact

Access

Associated Laboratories/Individuals

Contact

Access

Associated Laboratories/Individuals

Contact

Access

Associated Laboratories/Individuals

Contact

Associated Laboratories/Individuals

Access

Associated Laboratories/Individuals

Contact

Access

Associated Laboratories/Individuals

Contact

Access

Contact

Access

Contact

Access

Associated Laboratories/Individuals

Contact

Access

Contact

Associated Laboratories/Individuals

Access

Contact

Access

Contact

Access

Associated Laboratories/Individuals

Contact

Access

Associated Laboratories/Individuals

Contact

Access

Associated Laboratories/Individuals

Contact

Associated Laboratories/Individuals

Access

Contact

Associated Laboratories/Individuals

Access

Associated Laboratories/Individuals

Contact

Access

Associated Laboratories/Individuals

Contact