Tools, Data & Databases

This site lists tools and resources related to data science that have been developed by JAX faculty and staff or that support research collaborations with JAX investigators. No animal resources are listed here. For a listing that includes animal resources, visit the Research Tools and Resources page.

Name Name (A-Z) Research Area Type Description

CloudNeo

Access

Online & Download

Contact

jeff.chuang@jax.org

CloudNeo Software Tools & Analysis
CloudNeo is a cloud-based computational workflow for identifying patient-specific tumor neoantigens from next generation sequencing data.

DO-AS cohort data

Associated Individuals/Groups

DO-AS cohort data Software Tools & Analysis
Phenotype and RNAseq data from approx. 200 (B6-Col4a5 KO x DO)F1 mice

EMASE

Access

Download

Associated Individuals/Groups

Contact

gary.churchill@jax.org

EMASE Software Tools & Analysis
An expectation maximization algorithm for allele specific expression. Primary author K. Choi of the Churchill Lab.

FusorSV

Contact

Ankit.Malhotra@jax.org

FusorSV Software Tools & Analysis
FusorSV is a data mining-based framework that allows for comprehensive and robust detection of Structural Variations (SV) from next generation sequencing datasets. We built SV engine (SVE) that includes all tools including fusorSV that can be used for analysis of new datasets. SVE also includes data models built using 1000 Genomes SV callsets as ground truth.
Read less Read more…
Comprehensive and accurate Structural Variation (SV) discovery from next generation sequencing data remains a major challenge. Popular approaches to overcome performance limitations of existing SV-calling algorithms are to use complementary algorithms to determine the SV loci and then merge them under a heuristic manner. However, such approaches do not consider the strengths and weaknesses of individual algorithms and either under or over merge the variant loci resulting in false SV calls. Here, we present FusorSV, an open source tool that uses a data mining approach to assess performance and merge callsets from an ensemble of SV-calling algorithms. We developed a FusorSV SV calling model using an ensemble of eight SV-calling algorithms for the analysis of 27 deep-coverage human genomes from the 1000 Genomes Project (1000GP). For an easy-to-use SV detection pipeline, we also built Structural Variation Engine (SVE) that is capable of performing gold standard SV analysis for whole genome sequencing projects.

g2g Tools

Access

Download

Associated Individuals/Groups

Contact

gary.churchill@jax.org

g2g Tools Software Tools & Analysis
Genome Editing tools. Creates custom genomes by incorporating (phased) SNPs and indels into reference genome, extracts regions of interest, e.g., exons or transcripts, from custom genomes, and converts coordinates of files (bam, gtf, bed) between two genomes.

gbrs

Access

Download

Associated Individuals/Groups

Contact

gary.churchill@jax.org

gbrs Software Tools & Analysis
Genotype-free genome reconstruction and ASE quantification.
Read less Read more…
Example use case deducing potential sample mixups by comparison of GigaMUGA haplotype reconstructions to haplotypes deduced from islet RNA-seq-based genotype-by-sequencing method (mentioned in Chick/Munger et al. Defining the consequences of genetic variation on a proteome–wide scale, 2016 PMID: 5292866).

Gene Expression Database (GXD)

Access

Online

Contact

mgi-help@jax.org

Gene Expression Database (GXD) Database
Gene Expression Database (GXD) is a database project that integrates different types of gene expression information from the mouse and provides a searchable index of published experiments on endogenous gene expression during development. GXD as a core component of the Mouse Genome Informatics (MGI) resource which is the international community database resource for the laboratory mouse, providing integrated genetic, genomic, and biological data to facilitate the study of human health and disease.
Read less Read more…
GXD stores primary data from different types of expression assays. By integrating these data, GXD provides, as data accumulate, increasingly complete information about the expression profiles of transcripts and proteins in different mouse strains and mutants. GXD also describes expression patterns using an extensive, hierarchically-structured dictionary of anatomical terms. In this way, expression results from assays with differing spatial resolution are recorded in a standardized and integrated manner and expression patterns can be queried at different levels of detail. GXD places the gene expression data in the larger biological context by establishing and maintaining interconnections with many other resources. Integration with MGD enables a combined analysis of genotype, sequence, expression, and phenotype data. Links to PubMed, Online Mendelian Inheritance in Man (OMIM), sequence databases, and databases from other species further enhance the utility of GXD.

GeneWeaver

Access

Online

Associated Individuals/Groups

Contact

elissa.chesler@jax.org

GeneWeaver Database|Dataset|Software Tools & Analysis
GeneWeaver is a platform for the integrative analysis of heterogeneous functional genomics data. It allows users to compare and contrast biological functions across multiple species based on the genes, gene products and variants associated with these functions in global functional genomics analyses.
Read less Read more…
GeneWeaver is a database and suite of tools for the discovery of convergent relations among heterogeneous functional genomics studies of biological processes and disease related functions. It consists of a collection of integrated micro-services for the aggregation and integration of multi-species functional genomics data from curated resources, published experiments and aggregated evidence from genomic data resources. Its analysis services are built on statistical scalable combinatorial algorithms to allow users to develop custom workflows to perform set operations to compare and contrast diverse data sets. The system is accessed either through a user-friendly browser interface or an API. All analyses are repeatable on the original instance of user-selected data. A curation interface allows users to manage the storage of gene sets from literature and to perform annotation of gene sets to controlled vocabularies and ontologies.

GFR Calculator

Associated Individuals/Groups

GFR Calculator Software Tools & Analysis
This tool calculates the glomerular filtration rate in mice using the FITC-inulin method

HaploQA

Access

Online & Download

Associated Individuals/Groups

Contact

haploqa@jax.org

HaploQA Software Tools & Analysis
A web application for performing haplotype analysis of genotype calls from the “MUGA” platform genotyping arrays
Read less Read more…
A web application for performing haplotype analysis of genotype calls from the “MUGA” platform genotyping arrays. The application was developed at the Jackson Laboratory to facilitate genetic quality assurance of mice using genotype data derived from these platforms. The tool allows the community to examine data sets which have been publicly released by viewing Karoytype plots generated by haplotype reconstructions. An individual can also contact the team to register and receive an account, which will allow them to upload their own data (MegaMUGA or GigaMUGA genotypes), have haplotype reconstructions run, and examine their data using a private account, and if they choose, share that data publicly. An individual can also set up their own private instance, with source code available here: https://github.com/TheJacksonLaboratory/haploqa

HSA

Associated Individuals/Groups

HSA Resource
Integrating multi-track Hi-C data for genome-scale reconstruction of 3D chromatin structure

I-ATAC

Associated Individuals/Groups

I-ATAC Resource
ATAC-seq is a new protocol to capture open chromatin sites by performing adaptor ligation and fragmentation of open chromatin regions. Due to its efficiency in requirement of biological sample and in library preparation time, many scientists are generating ATAC-seq libraries to decipher the chromatin landscape of DNA in a given cell type and condition of interest.

ImageEchelon

Access

Online & Download

Associated Individuals/Groups

Contact

dave.walton@jax.org

ImageEchelon Software Tools & Analysis
Tool to quantify images where meaningful differences are discernible by eye, but difficult to quantify using traditional methods.
Read less Read more…
Image Echelon is a tool to quantify images where meaningful differences are discernible by eye, but difficult to quantify using traditional methods. It was developed to quantify neuronal fasciculation in microscopy images, but can be used to rank images based on any qualitative criteria. Classical methods ask observers to score an image in isolation along a scale (e.g., 1-5), which can be difficult to control between observers, especially in a highly variable data set. Image Echelon asks observers to compare two images and pick a “winner” and a “loser” along some criteria, an easier and more reliable task.

Intermediate

Access

Download

Associated Individuals/Groups

Contact

gary.churchill@jax.org

Intermediate Software Tools & Analysis
An R package for eQTL/pQTL mediation analysis.
Read less Read more…
Example use case deducing potential sample mixups by comparison of GigaMUGA haplotype reconstructions to haplotypes deduced from islet RNA-seq-based genotype-by-sequencing method (mentioned in Chick/Munger et al. Defining the consequences of genetic variation on a proteome–wide scale, 2016 PMID: 5292866).

Clinical KnowledgeBase (CKB)

Access

Online & Download

Contact

ckbsupport@jax.org

Clinical KnowledgeBase (CKB) Database
CKB is a dynamic, digital encyclopedia for precision oncology that connects cancer variants to therapies, efficacy evidence, and clinical trials and aids interpretation of complex cancer genomic profiles.
Read less Read more…
Advances in precision oncology have created a demand for scalable interpretative tools that address the growing complexity of genetic/genomic data and corresponding treatment modalities. The Jackson Laboratory’s Clinical Knowledgebase (CKB) is a leading resource in the effort to provide evidence-based information to clinicians, researchers and ultimately patients. Initially developed to support its own in-house Clinical Genomics laboratory, CKB is an expertly curated and publicly accessible relational knowledgebase of gene variants, targeted therapies, efficacy evidence and clinical trials related to cancer. Molecular variations currently include somatic mutations, copy number variations, fusions, tumor mutational burden, and microsatellite instability, and expression. The molecular entities are built into molecular profiles and are assigned to relevant treatment approaches. The therapeutic response also takes into account indication, which is represented by an integrated disease ontology. CKB provides evidence-based information and allows visibility into potential opportunities for research that could reveal novel cancer treatments.

Model-AD Consortium

Model-AD Consortium Resource
Model organism development and evaluation for late-onset Alzheimer's Disease.
Read less Read more…

The MODEL-AD consortium, consisting of a Center at Indiana University, The Jackson Laboratory, and Sage Bionetworks and a Center at the University of California Irvine, has been established by the National Institute on Aging to:

  • Develop the next generation of in vivo AD models based on human data
  • Institute a standardized and rigorous process for characterization of animal models
  • Align the pathophysiological features of AD models with corresponding stages of clinical disease using translatable biomarkers
  • Establish guidelines for rigorous preclinical testing in animal models
  • Ensure rapid availability of animal models, protocols and validation data to all researchers for preclinical drug development

Mouse Models of Human Cancer Database (MMHCdb)

Access

Online

Contact

mgi-help@jax.org

Mouse Models of Human Cancer Database (MMHCdb) Database
The Mouse Models of Human Cancer Database (formerly, the Mouse Tumor Biology database) integrates data on the frequency, incidence, genetics, and pathology of neoplastic disorders, emphasizing data on tumors that develop characteristically in different genetically defined strains of mice.
Read less Read more…
The MMHCdb database supports the use of a mouse model system for human cancer by providing a comprehensive resource for data and information on various tumor models. The database provides access to information on and data for: spontaneous and induced tumors in mice, genetically defined mice (inbred, hybrid, mutant, and genetically engineered strains of mice) in which tumors arise, genetic factors associated with tumor susceptibility in mice, somatic genetic-mutations observed in tumors, and Patient Derived Xenograft (PDX) models.

Mouse Genome Database (MGD)

Access

Online

Contact

mgi-help@jax.org

Mouse Genome Database (MGD) Database
Mouse Genome Database (MGD) is a core knowledgebase for the laboratory mouse and is focused on providing integrated genetic, genomic, and biological data to facilitate the study of human health and disease. MGD is a primary component of the Mouse Genome Informatics (MGI) Consortium.
Read less Read more…
The Mouse Genome Database (MGD) is the international community mouse database which supports basic, translational and computational research by providing integrated data on the genetics, genomics, and biology of the laboratory mouse. MGD serves as the source for biological reference data sets related to mouse genes, gene functions, phenotypes and disease models with an increasing emphasis on the association of these data to human biology and disease. MGD is the authoritative source of mouse gene and strain nomenclature as well as annotations for mouse gene function, phenotypes and human disease models.

MouseMine

Access

Online

Contact

mgi-help@jax.org

MouseMine Database
MouseMine is a powerful data warehouse providing comprehensive API (application programming interface) access to MGI data, as well as a forms-based user interface.
Read less Read more…
MouseMine is a powerful data warehouse providing comprehensive API (application programming interface) access to MGI data, as well as a forms-based user interface. MouseMine contains core data from MGI, including the mouse genome feature and allele catalogs, disease and phenotype annotations, expression data, publications, etc. Users can select from predefined query templates or construct custom queries and reports, iteratively refine searches, and save/reuse lists of results.

OncoCL

Associated Individuals/Groups

OncoCL Resource
OncoCL, an ontology to describe cancer cell types

Gene Ontology Consortium

Gene Ontology Consortium Resource
The Gene Ontology (GO) project is a collaborative effort to address the need for consistent descriptions of gene products across databases.

PDX Finder

Access

Online

Contact

helpdesk@pdxfinder.org

PDX Finder Database
PDX Finder is an open global cancer research portal to patient derived xenograft (PDX) models.
Read less Read more…
Patient-derived tumor xenograft (PDX) mouse models are a versatile oncology research platform for studying tumor biology and for testing chemotherapeutic approaches tailored to genomic characteristics of individual patient’s tumors. PDX models are generated and distributed by a diverse group of academic labs, multi-institution consortia, and contract research organizations. The distributed nature of PDX repositories and the use of different metadata standards presents a significant challenge to finding PDX models relevant to specific cancer research questions. The Jackson Laboratory and EMBL-EBI are addressing these challenges by co-developing PDX Finder, a comprehensive open global catalog of PDX models and their associated datasets. Within PDX Finder, model attributes are harmonized and integrated using a previously developed community minimal information standard to support consistent searching across the originating resources. Links to repositories are provided from the PDX Finder search results to facilitate model acquisition and/or collaboration.

PDX Development and Trial Centers Research Network

Access

Online

Contact

jeff.chuang@jax.org

PDX Development and Trial Centers Research Network Database
PDXNet is an NCI-sponsored consortium that uses patient-derived xenografts to accelerate translational research for the broader research community. The Chuang lab has been co-leader of the Data Coordination Center for this consortium since 2017.

Mouse Phenome Database

Mouse Phenome Database Database|Resource
This resource is a collaborative standardized collection of measured data on laboratory mouse strains and populations. Includes baseline phenotype data sets as well as studies of drug, diet, disease and aging effect. Also includes protocols, projects and publications, and SNP, variation and gene expression studies.

Human Phenotype Ontology

Access

Online & Download

Contact

peter.robinson@jax.org

Human Phenotype Ontology Database
The Human Phenotype Ontology (HPO) provides a standardized vocabulary of phenotypic abnormalities encountered in human disease.
Read less Read more…
The Human Phenotype Ontology (HPO) provides a standardized vocabulary of phenotypic abnormalities encountered in human disease. Each term in the HPO describes a phenotypic abnormality, such as Atrial septal defect. The HPO is currently being developed using the medical literature, Orphanet, DECIPHER, and OMIM. HPO currently contains over 13,000 terms and over 156,000 annotations to hereditary diseases. The HPO project and others have developed software for phenotype-driven differential diagnostics, genomic diagnostics, and translational research. The HPO is a flagship product of the Monarch Initiative, an NIH-supported international consortium dedicated to semantic integration of biomedical and model organism data with the ultimate goal of improving biomedical research. The HPO, as a part of the Monarch Initiative, is a central component of one of the 13 driver projects in the Global Alliance for Genomics and Health (GA4GH) strategic roadmap.

Digital Pathology Pipeline

Associated Individuals/Groups

Digital Pathology Pipeline Software Tools & Analysis
Tools for automatic glomerulus identification and histological quantification from scanned PAS slides

Glaucoma Discovery Platform

Access

Online

Associated Individuals/Groups

Contact

gareth.howell@jax.org

Glaucoma Discovery Platform Database|Dataset|Software Tools & Analysis
Visualize and interrogate gene expression changes in glaucoma
Read less Read more…
The Glaucoma Discovery Platform allows a user to visualize and interrogate gene expression changes in glaucoma, based on a study done in the lab of Simon John at the Jackson Laboratory. At the time of development existing resources such as GEO and ArrayExpress were not suitable for interrogating and visualizing such as complex dataset. Therefore, to maximize the benefit of this study for us, and the wider scientific community, we developed Glaucoma Discovery Platform, a freely available web-based environment. Glaucoma Discovery Platform was developed using a suite of scripts we term Datgan. To our knowledge, no other resource was available that provided this combination of user-friendly functionality.

AMP-AD Knowledge Portal

Access

Online & Download

Contact

ampadportal@sagebionetworks.org

AMP-AD Knowledge Portal Database|Dataset
The AMP-AD Knowledge Portal is a platform to access data, analytical results and tools generated within the National Institute of Aging’s AD Translational Research Program. The majority of the content in the Portal is genomic data generated from human samples or experimental model systems. The Portal also contains analytical results and data summaries.
Read less Read more…
The AMP-AD Knowledge Portal is a platform to access data, analytical results and tools generated within the National Institute of Aging’s AD Translational Research Program. All projects within this program operate as an open science collaboration and openly share resources early in the research life cycle for evaluation and reuse. The Portal is named for the first consortium to be initiated, the Accelerating Medicines Partnership in Alzheimer’s Disease Target Discovery and Preclinical Validation Project (AMP-AD). The majority of the content in the Portal is genomic data generated from human samples or experimental model systems, including the MODEL-AD Consortium. In addition to this, the Portal also contains bioinformatic analytical results including some generated by teams within individual projects and some generated through multi-team working groups. Bioinformaticians can download these resources for use in their own research. You will need a Synapse account to access the data in the AMP AD Knowledge Portal. The AMP-AD Knowledge Portal is funded by the National Institute on Aging. It is developed and maintained by Sage Bionetworks.

QuIN

Associated Individuals/Groups

QuIN Resource
QuIN (Query tool for Interaction Networks, available at quin.jax.org) is a tool for visualizing, annotating, and querying chromatin interactions derived from technologies such as ChIA-PET or HiC.

R/DOQTL

Access

Download

Associated Individuals/Groups

Contact

gary.churchill@jax.org

R/DOQTL Software Tools & Analysis
DOQTL is a quantitative trait locus (QTL) mapping pipeline designed for Diversity Outbred mice and other multi-parent outbred populations.

Random Circuit Perturbation (sRACIPE)

Access

Download

Contact

Vivek.Kohar@jax.org

Random Circuit Perturbation (sRACIPE) Software Tools & Analysis
sRACIPE is a systems-biology modeling method which takes the gene regulatory circuit topology as the only input, and simulates an ensemble of models with random kinetic parameters at multiple noise levels. Statistical analysis of the generated gene expressions reveals the basin of attraction and stability of various phenotypic states and their changes associated with intrinsic and extrinsic noises yielding new insights on the structure and function of gene regulatory networks.

Cre Portal

Access

Online

Contact

mgi-help@jax.org

Cre Portal Database
The Cre Portal contains curated data about all recombinase-containing transgenes and knock-ins developed in mice to provide a comprehensive resource delineating known recombinase activity patterns and allowing users to find relevant mouse resources for their studies.
Read less Read more…
Conditional mutagenesis is a powerful technique that allows studies of gene function where knockout homozygotes are lethal or where the mutation affects multiple systems. This technique is dependent on activity of a recombinase in a tissue or life stage of interest. Information on cre activity and specificity in mice that express recombinases is collected in the Cre Portal. Mouse resources expressing recombinases in a spatial or temporal manner can be queried by tissue or by driver of recombinase expression. Autocomplete of these search boxes will show all available terms; terms in bold indicate data is available for that term. Downloadable reports, data metrics and links to related queries are also available. Data are also available at www.mousemine.org via custom queries and API.

Mouse Mutant Resource Database

Access

Online

Contact

laura.reinholdt@jax.org

Mouse Mutant Resource Database Database|Software Tools & Analysis
The MMR mouse variation database provides access to all genetic variants called from high-throughput exome and whole genome sequencing from mice exhibiting spontaneously arising Mendelian disease phenotypes. Phenotype, genetic mapping, and variant frequency metrics are also provided. Aggregated data analyses provide mutation candidate prioritization.
Read less Read more…
We developed the Mouse Mutant Resource Database (https://mmrdb.jax.org) to host annotated variant calls and sample metadata and to facilitate data sorting, filtering, querying, and sharing. The database employs an algorithm for variant prioritization. The algorithm makes the following assumptions about causative variants: they will be rare (<3%) in the database, the allele ratio of the variant in the sample will fall within expectations for the sample genotype (>0.9 homozygous; 0.2–0.8 heterozygous), and the chromosomal position of the variant will be in agreement with chromosomal linkage data. We optimized the algorithm iteratively by reanalyzing exome data sets with previously confirmed, known mutations.

RFR-PEL

Associated Individuals/Groups

RFR-PEL Resource
Random Forest Regression for Epigenetic Length prediction

SARNAclust

Access

Download

Contact

jeff.chuang@jax.org

SARNAclust Software Tools & Analysis
SARNAclust is a novel semi-automatic algorithm to identify RNA-protein binding motifs from immunoprecipitation data.
Read less Read more…
SARNAclust is the first unsupervised method to identify and deconvolve multiple RNA sequence/structure motifs simultaneously. SARNAclust computes similarity between sequence/structure objects using a graph kernel, providing the ability to isolate the impact of specific features through the bulge graph formalism. For full details see: https://doi.org/10.1371/journal.pcbi.1006078 Dotu I, Adamson SI, Coleman B, Fournier C, Ricart-Altimiras E, Eyras E, et al. (2018) SARNAclust: Semi-automatic detection of RNA protein binding motifs from immunoprecipitation data. PLoS Comput Biol 14(3): e1006078.

MAV-seq

MAV-seq Resource
MAV-seq (Management, Analysis, Visualization of Sequence data) is an interactive, user friendly, cross platform, secure, encrypted, automated, customized, centralized, multi-roles based database application for the management of sample repertoires and automation of the data pre-processing of epigenomic and transcriptomic data.

Seqfold

Associated Individuals/Groups

Seqfold Resource
SeqFold is a tool for RNA secondary structure prediction from experimental data.

International Mouse Strain Resource (IMSR)

Access

Online

Contact

mgi-help@jax.org

International Mouse Strain Resource (IMSR) Database
The International Mouse Strain Resource offers users a combined catalog of worldwide mouse resources (live, cryopreserved, and embryonic stem cells), with direct access to repository sites holding those resources of interest.
Read less Read more…
The International Mouse Strain Resource (IMSR) provides an online searchable web-based catalog of mouse resources available globally, including inbred, mutant, and genetically engineered mice, cryopreserved embryos and gametes, and ES cell lines. The IMSR website provides, for each strain or cell line, links for ordering, links to the repositories’ strain description, and links to phenotype and disease model data at Mouse Genome Informatics (MGI). Searches can be performed using one or many parameters, including the strain/stock designation, the strain repository or MGI ID, the state in which the strain/resource is maintained and the strain type. Genetic search parameters include the symbol or name of the phenotypic allele or gene of interest carried in the strain, and repository parameters include the name of one or more specific repositories, or the selection of all repositories in a geographical regional location.

aln Tools

Access

Download

Associated Individuals/Groups

Contact

gary.churchill@jax.org

aln Tools Software Tools & Analysis
Processes NGS alignments into a sparse compressed incidence matrix. Stores pre-defined binary format for efficient downstream analyses and storage.

Tumor Fusions

Access

Online

Contact

roel.verhaak@jax.org

Tumor Fusions Dataset
Based on integrated analysis of paired-end RNA sequencing and DNA copy number data from The Cancer Genome Atlas(TCGA), The Tumor Fusion Gene Data Portal provides a bona-fide fusion list across many tumor types.
Read less Read more…
Transcripts fusion as a result of genomic rearrangement is an important class of somatic alteration, as a cancer initiating event and as a molecular therapeutic target for specific tumors. Our Pipeline for RNA sequencing Data Analysis (PRADA) enables us to detect fusion transcripts with high confidence comprehensively. Based on integrated analysis of paired-end RNA sequencing and DNA copy number data from The Cancer Genome Atlas(TCGA), The Tumor Fusion Gene Data Portal provides a bona-fide fusion list across many tumor types.

Multiple Genome Viewer

Access

Online

Contact

mgi-help@jax.org

Multiple Genome Viewer Software Tools & Analysis
Explore and compare multiple annotated mouse genomes.
Read less Read more…
The Multiple Genome Viewer (MGV) allows you to explore and compare chromosomal regions and synteny blocks between the C57BL/6J reference genome and 18 other mouse inbred strains: 16 sequenced and annotated by the Wellcome Institute Sanger Mouse Genomes Project and two (CAROLI/EiJ and PAHARI/EiJ) published by Paul Flicek and Duncan Odom and others, as well as the C57BL/6J reference genome.

QTL Viewer

Access

Online & Download

Contact

gary.churchill@jax.org

QTL Viewer Software Tools & Analysis
Interactive web-based analysis tool that will allow users to replicate analyses reported for a study.
Read less Read more…
QTL Viewer is an interactive web-based analysis tool that will allow users to replicate the analyses reported for a study (For example the viewer at http://churchill-lab.jax.org/qtl/islet/DO378 represents the data published in the paper Keller, et al. Genetic Drivers of Pancreatic Islet Function, PMID 29567659). It includes the ability to search various subsets of data from a study such as phenotypes or expression data and then visualize data with profile, correlation, LOD, effect, mediation and SNP association plots.

WORMHOLE

Access

Online & Download

Associated Individuals/Groups

Contact

The Korstanje Lab

WORMHOLE Database
The WORM Human OrthoLogy Explorer is a meta-tool that uses machine learning to predict novel least diverged orthologs (LDOs) by integrating ortholog predictions from 17 algorithms.
Read less Read more…
Support vector machine (SVM) classifiers are trained to distinguish whether a gene is or is not an LDO by comparing the predictions of the consituent algorithms across a set of high-confidence examples of known LDOs (the PANTHER LDOs). Originally conceived to predict orthologs between humans and worms, the scope was later expanded to include five commonly used eukaryotic model organisms: humans (Homo sapiens), mice (Mus musculus), zebrafish (Danio rerio), fruit flies (Drosophila melanogaster), and nematodes (Caenorhabditis elegans). The WORMHOLE SVMs are used to calculate LDO confidence scores (aka WORMHOLE Scores) for genome-wide gene pairs between combination of species.