Collaborative Cross and Diversity Outbred Reference Data

CC/DO Founder Names, Letters, and Colors

Strain

JAX Stock No.

Letter

Color (hex)

Color (RGB)

Color

A/J

000646

A

#F0E442

240, 228, 66


C57BL/6J

000664

B

#555555

85, 85, 85

129S1/SvImJ

002448

C

#E69F00

230, 159, 0

NOD/ShiLtJ

001976

D

#0072B2

0, 114, 178

NZO/HlLtJ

002105

E

#56B4E9

86, 180, 233

CAST/EiJ

000928

F

#009E73

0, 158, 115

PWK/PhJ

003715

G

#D55E00

213, 94, 0

WSB/EiJ

001145

H

#CC79A7

204, 121, 167

Purchasing Genetically Diverse Mice

Purchase Collaborative Cross Mice (Search for “Collaborative Cross” to view available strains)
Purchase Diversity Outbred Mice

Genotyping

Collaborative Cross mice genomes are available below.
Diversity Outbred mice may be genotyped using the GigaMUGA through Neogen. They are currently the only vendor who can process the GigaMUGA.

Diversity Outbred Genome Report

This report details the founder allele frequencies across the genome. This may be useful if you suspect are investigating a specific founder allele at a specific position.

https://jackson.jax.org/rs/444-BUH-304/images/2022-J.DO-Genetic-Diversity-Report.pdf

This page contains reference data for the Collaborative Cross (CC) and Diversity Outbred (DO) founders. You will find reference data for:

Mouse Universal Genotyping Array (MUGA)

MUGA
MegaMUGA
GigaMUGA

Founder Denovo Genome Assemblies
Founder Pseudogenomes (i.e. SNPs & Indels inserted)
Genomic Variants

Mouse Universal Genotyping Array Series

MUGA

Marker positions

File

GRCm38

GRCm39

Description

Marker positions

muga_uwisc_v1.csv

muga_uwisc_v2.csv

CSV file. 7,854 rows. Contains cM values.

Marker file metadata

muga_uwisc_dict_v1.csv

muga_uwisc_dict_v2.csv

Column descriptions for marker file.

CC/DO Founders

  • Founder Sample Metadata
    • Founder Metadata:  48 rows, 5 columns. Contains sex and strain background for each sample. (6 DO samples excluded).
  • Founder Consensus Genotypes
    • Genotypes: Gzipped CSV file. 7,762 rows, 9 columns. Genotypes encoded as one letter: A, C, G, T, H, or N.
  • Founder Probe Intensities
    • X Channel: Gzipped CSV file. 7,762 rows, 9 columns.
    • Y Channel: Gzipped CSV file. 7,762 rows, 9 columns.

All Reference Samples

  • Reference Sample Metadata
  • Genotypes
    • Genotypes: Gzipped CSV file. 7,762 rows, 55 columns. Genotypes encoded as one letter: A, C, G, T, H, or N.
  • Probe Intensities
    • X Channel: Gzipped CSV file. 7,762 rows, 55 columns.
    • Y Channel: Gzipped CSV file. 7,762 rows, 55 columns.

MegaMUGA

Marker positions

File

GRCm38

GRCm39

Description

Marker positions

mm_uwisc_v1.csv

mm_uwisc_v2.csv

CSV file. 77,808 rows. Contains cM values.

Marker file metadata

mm_uwisc_dict_v1.csv

mm_uwisc_dict_v2.csv

Column descriptions for marker file.

CC/DO Founders

File

Link

Description

Founder metadata

MegaMUGA_founder_metadata.csv

CSV file. 175 rows, 5 columns.

Consensus genotypes

MegaMUGA_founder_consensus_genotypes.csv.gz

Gzipped CSV file. 75,053 rows, 9 columns.  Genotypes encoded as one letter: A, C, G, T, H, or N.

X channel intensities

MegaMUGA_founder_mean_x_intensities.csv.gz

Gzipped CSV file. 75,053 rows, 9 columns.

Y channel intensities

MegaMUGA_founder_mean_y_intensities.csv.gz

Gzipped CSV file. 75,053 rows, 9 columns.

All Reference Samples

File

Link

Description

Sample metadata

MegaMUGA_sample_metadata.csv

CSV file. 364 rows, 4 columns

Sample genotypes

MegaMUGA_genotypes.csv.gz

Gzipped CSV file. 75,058 rows, 354 columns. Genotypes encoded as one letter: A, C, G, T, H, or N.

X channel intensities

MegaMUGA_x_intensities.csv.gz

Gzipped CSV file. 75,058 rows, 354 columns.

Y channel intensities

MegaMUGA_y_intensities.csv.gz

Gzipped CSV file. 75,058 rows, 354 columns.

GigaMUGA

Marker positions

File

GRCm38

GRCm39

Description

Marker positions

gm_uwisc_v1.csv

gm_uwisc_v4.csv

CSV file. 143,259 rows. Contains cM values.

Marker file metadata

gm_uwisc_dict_v1.csv

gm_uwisc_dict_v4.csv

Column descriptions for marker file.

CC/DO Founders

File

Link

Description

Founder metadata

GigaMUGA_founder_metadata.csv

CSV file. 170 rows, 5 columns.

Consensus genotypes

GigaMUGA_founder_consensus_genotypes.csv.gz

Gzipped CSV file. 123,915 rows, 9 columns. Genotypes encoded as one letter: A, C, G, T, H, or N.

Chr Y Haplogroups

GigaMUGA_founder_consensus_genotypes_Y.csv

CSV file. 13 rows, 7 columns. Genotypes encoded as one letter: A, C, G, T.

Chr M Haplogroups

GigaMUGA_founder_consensus_genotypes_Mt.csv

CSV file. 20 rows, 6 columns.  Genotypes encoded as one letter: A, C, G, T.

X channel intensities

GigaMUGA_founder_mean_x_intensities.csv.gz

Gzipped CSV file. 123,915 rows, 9 columns.

Y channel intensities

GigaMUGA_founder_mean_y_intensities.csv.gz

Gzipped CSV file. 123,915 rows, 9 columns.

 

All Reference Samples

File

Link

Description

Sample metadata

GigaMUGA_sample_metadata.csv

CSV file. 279 rows, 4 columns.

Sample genotypes

GigaMUGA_genotypes.csv.gz

Gzipped CSV file. 123,998 rows, 280 columns. Genotypes encoded as one letter: A, C, G, T, H, or N.

X channel intensities

GigaMUGA_x_intensities.csv.gz

Gzipped CSV file. 123,998 rows, 280 columns.

Y channel intensities

GigaMUGA_y_intensities.csv.gz

Gzipped CSV file. 123,998 rows, 280 columns.

R/qtl2 Support Files

File

GRCm38

GRCm39

Description

CC/DO variants

cc_variants.sqlite

cc_variants_grcm39_ens104.sqlite

SQLite database of CC/DO founder SNPs & Indels.

Genes

mouse_genes_mgi.sqlite

mouse_genes_ensembl104.sqlite

SQLite database of mouse genes.

Founder Denovo Genome Assemblies

These data were generated by the Wellcome Sanger Mouse Genome Project. The conditions of use laid out by the authors are as follows:

"In this study, we have produced reference quality chromosome scale genomes for both classical and wild derived inbred laboratory mouse genomes. These genomes were produced using third generation long reads and Hi-C. The genomes and gene annotation are also available via the Ensembl and UCSC genome browsers. The Mouse Genomes Project releases genome sequence data, SNPs and other variant calls as a service to the research community. These data are released in accordance with the Fort Lauderdale agreement and Toronto agreements. As producers of these data we reserve the right to be the first to publish a genome-wide analysis of the data we have generated. The pre-publication data that we release is embargoed for publication except for analyses of single chromosomes in single strains or single gene loci across multiple strains. We strongly encourage researchers to contact us if there are any queries about referencing or publishing analysis based on pre-publication data obtained via this website (Email: mousegenomes@sanger.ac.uk OR the PI, Thomas Keane, tk2@ebi.ac.uk). "

Strain

EBI Page

FASTA file

Description

A/J

EBI page

FASTA file

Gzipped FASTA file.

129S1/SvImJ

EBI page

FASTA file

Gzipped FASTA file.

NOD/ShiLtJ

EBI page

FASTA file

Gzipped FASTA file.

NZO/HlLtJ

EBI page

FASTA file (This file is not yet listed on the EBI website)

Gzipped FASTA file.

CAST/EiJ

EBI page

FASTA file

Gzipped FASTA file.

PWK/PhJ

EBI page

FASTA file

Gzipped FASTA file.

WSB/EiJ

EBI page

FASTA file

Gzipped FASTA file.

 

Genomic Variants

These data were generated by the Wellcome Sanger Mouse Genome Project. The conditions of use laid out by the authors are in their README file.

File

GRCm38 (Ensembl 78)

GRCm39 (Ensembl 104)

Description

SNPs

mgp.v5.merged.snps_all.dbSNP142.vcf.gz

mgp_REL2021_snps.vcf.gz

Gzipped VCF file.

SNPs Index

mgp.v5.merged.snps_all.dbSNP142.vcf.gz.tbi

mgp_REL2021_snps.vcf.gz.csi

Tabix index. *.tbi or *.csi file.

Indels

mgp.v5.merged.indels.dbSNP142.normed.vcf.gz

mgp_REL2021_indels.vcf.gz

Gzipped VCF file.

Indels Index

mgp.v5.merged.indels.dbSNP142.normed.vcf.gz.tbi

mgp_REL2021_indels.vcf.gz.csi

Tabix index. *.tbi or *.csi file.

Deletions

REL-1606-SV/mgpv5.SV_deletions.bed.gz

Not available

Gzipped BED file.

Deletions Index

mgpv5.SV_deletions.bed.gz.tbi

Not available

Tabix index for BED file. *.tbi file.

Insertions

mgpv5.SV_insertions.bed.gz

Not available

Gzipped BED file.

Insertions Index

mgpv5.SV_insertions.bed.gz.tbi

Not available

Tabix index for BED file. *.tbi file.

These data are associated with Resolution of structural variation in diverse mouse genomes reveals chromatin remodeling due to transposable elements, Cell Genomics, 2023. Please cite this publication if you use this data.

 File

GRCm38

GRCm39 (Ensembl 104)

Description

Structural Variants

Not available

 merged_SV_VCF.zip

Gzipped VCF file.

Founder Pseudogenomes

These are FASTA files with SNPs and Indels from the Sanger variant files inserted into each genome. GRCm38.

Strain

File

Description

A/J

A_J.with_unplac_unloc.fa.gz

Gzipped FASTA file.

C57BL/6J

Mus_musculus.GRCm38.dna.primary_assembly.fa.gz

Gzipped FASTA file. Ensembl 102 reference.

129S1/SvImJ

129S1_SvImJ.with_unplac_unloc.fa.gz

Gzipped FASTA file.

NOD/ShiLtJ

NOD_ShiLtJ.with_unplac_unloc.fa.gz

Gzipped FASTA file.

NZO/HlLtJ

NZO_HlLtJ.with_unplac_unloc.fa.gz

Gzipped FASTA file.

CAST/EiJ

CAST_EiJ.with_unplac_unloc.fa.gz

Gzipped FASTA file.

PWK/PhJ

PWK_PhJ.with_unplac_unloc.fa.gz

Gzipped FASTA file.

WSB/EiJ

WSB_EiJ.with_unplac_unloc.fa.gz

Gzipped FASTA file.

Collaborative Cross Reference Data

We provide consensus genotypes and a qtl2-style allele probs object for use in genetic mapping. We also provide Chr M and Y genotype and founder funnel codes.

Data compiled from:
Srivastava et al. Genetics, 2017
Shorter et al. G3, 2019

File

Link

Description

Consensus Genotypes

CC_consensus_genotypes.csv

CSV file.133,948 rows, 77 columns. ACGT genotypes for 76 CC strains.

Founder Allele Probabilities

CC_diplotype_probs.rds

R binary file (*.rds).  “allele probs” object in qtl2 format. 76 strains, 8 founders, 120,625 markers.

Chr M & Y Genotypes and Funnel Codes

cc_sample_metadata.csv

CSV file. 76 rows, 5 columns.