With the completion of the human genome sequence and the sequences of other organisms used to model human disease, it appeared that the greatest discoveries of the genome were behind us. However, researchers have recently revealed a new level of genetic complexity that will forever change our perception of the human genome as a stable entity. This newly discovered phenomenon, designated "breakthrough of the year" by the journal Science in December 2007 (Pennisi 2007), is the presence of copy number variations. Defined as deletions or duplications of DNA greater than 1 kb (Feuk et al. 2006), copy number variations were known to exist, but their presence in such large numbers reveals the genome is much more dynamic than previously thought.
To date, 12% of the human genome is estimated to be associated with copy number differences (Redon et al. 2006) and more than 31,615 copy number variations have been catalogued in the Database for Genomic Variants, a central repository for human copy number variation data. Of even greater significance, these genomic variants are predicted to contain hundreds of genes, including many associated with human diseases (Redon et al. 2006). However, copy number variations are not always associated with disease; even between healthy individuals, CNVs are abundant. Reports indicate that CNV sizes vary greatly, but as technologies for detecting them at higher resolution have improved, the majority of human CNVs have been shown to be relatively small, containing less than 10 kb of sequence (Eichler 2006).
The discovery of CNVs has had far-reaching implications for disease research, enabling scientists to correlate genetic alterations with disease. Since the advent of DNA technologies, scientists have searched for genetic abnormalities that explain the differences in disease susceptibility among individuals. Historically, this search has focused on allelic (sequence) differences. Although some diseases, such as Huntington's disease and some forms of type 2 diabetes, are monogenic, the vast majority of human diseases are believed to be polygenic, with numerous genes contributing to the disease phenotype.
With the recent discovery of CNV abundance, a new set of genetic information is now available to determine novel factors responsible for human diseases and other complex traits. It remains to be seen whether causative factors previously attributed to environmental effects will be correlated with differences in gene copy number in future studies. Published reports have recently implicated copy number with a critical role in a number of human diseases, including cancer (La Starza et al. 2007), mental retardation (Sharp et al. 2006), and neurological disorders (Lee and Lupski 2006). Also of interest, the presence of CNVs may explain the abundant gaps in the genome sequences of numerous organisms, including humans. These gaps are the result of regional complexities that prevent a consensus sequence from being developed in those regions. Thus, refining the location and content of CNVs may contribute to more complete and useful genome sequences.
With the value of CNV research becoming widely apparent, there is a great demand for detailed detection of CNVs in organisms used to model human disease. Both high-resolution SNP arrays and comparative genomic hybridization arrays have been instrumental in detecting CNVs in a number of model organisms, including the mouse (Li et al. 2004; Snijders et al. 2005; Graubert et al. 2007; Watkins-Chow and Pavan 2008; She et al. 2008). As in humans, the majority of mouse CNVs (approximately 77.5%) are predicted to contain genes (Graubert et al. 2007). Many CNVs have been identified in studies across different inbred mouse strains. In a 2007 study, CNVs in the genomic DNA from 21 inbred strains from The Jackson Laboratory were determined and catalogued (Graubert et al. 2007). The inbred "J" strains selected are the focus of an international effort to generate extensive genotypic and phenotypic data for the most commonly used inbred strains, and make it publicly available in a central location, the Mouse Phenome Database. The contribution of mouse CNVs to this resource adds a valuable tool for correlating genetic differences to disease susceptibility in mice and humans.
In a recent study, researchers at the National Institutes of Health discovered CNVs within an inbred population of C57BL/6J mice (Watkins-Chow and Pavan 2008), an interesting finding considering these mice have been stably maintained by more than 200 generations of inbreeding by sibling mating. In this study, the copy number of the insulin-degrading enzyme (Ide) gene, previously associated with increased risk for type I diabetes (Scott et al. 2007) and Alzheimer's disease (Mueller et al. 2007; Vepäläinen et al. 2007), was shown to be heterogeneous in more than 50% of C57BL/6J mice assessed. This discovery led to the conclusion that CNVs arise relatively quickly "even the most carefully maintained colonies in the world" (Watkins-Chow and Pavan 2008). Similar studies in humans further verify the rapid development of CNVs. In a recent report, comparisons of the genomic DNA of monozygotic twins led to the discovery of numerous CNVs between these otherwise "identical" individuals (Bruder et al. 2008).
Differences in gene copy number do not necessarily result in a change in the level of gene expression. For example, research scientists have found that mice with duplications of the fibroblast growth factor binding protein 3 (Fgfbp3) genomic sequence express significantly more Fgfbp3 in spleen, whereas expression levels in the brain are unchanged (Watkins-Chow and Pavan 2008). Therefore, organisms can differentially regulate the expression of genes in different tissues, compensating for CNVs that might otherwise lead to disease.
Given that inbred mouse strains are the models of choice for determining the genetic basis of complex disease traits, including cancer, autoimmune disease, cardiovascular disease and diabetes, it has been suggested that "it is important for the research community to begin to identify existing CNVs in the most widely used strains where genetic drift has been limited by cryopreservation programs" (Watkins-Chow and Pavan 2008). Numerous other studies have stressed the need for such an effort. The high level of allele fixation/homozygosity in inbred strains facilitates detecting CNVs and determining the genetic features that give them unique phenotypes that make them useful models of human disease (Adams et al. 2005; She et al. 2007; Cahan 2008).
The most commonly used inbred and mutant strains distributed by The Jackson Laboratory are part of a unique genetic stability program designed to minimize genetic drift through frequent recovery of cryopreserved stocks. The mice produced through this program will be a critical resource for unlocking the dynamic process of CNVs arising in the genome, and for studying the effects of CNVs in a highly stabilized genetic environment.
One of the prime directives of genomic research is to determine the genetic causes of human disease, and the discovery of CNVs is a significant milestone toward achieving that mission.
Adams et al. 2005. Nat Genet 37:532-6.
Bruder et al. 2008. Am J Hum Genet 82:763-71.
Cahan et al. 2008. Nucleic Acids Res 36:e41.
Eichler 2006. Nat Genet 38:9-11.
Feuk et al. 2006. Nat Rev Genet 7:85-97.
Graubert et al. 2007. PLoS Genet 3:e3.
La Starza et al. 2007. Cancer Genet Cytogenet 175:73-76.
Lee and Lupski 2006. Neuron 52:103-121.
Li et al. 2004. Nat Genet 36:952-4.
Mueller et al. 2007. Neurobiol Aging 28:727-734.
Pennisi 2007. Science 318:1842-3.
Redon et al. 2006. Nature 444:444-54.
Scott et al. 2007. Science 316:1341-5.
Sharp et al. 2006. Nat Genet 38:1038-1042.
She et al. 2008. Nat Genet 40:909-14.
Snijders et al. 2005. Genome Res 15:302-11.
Vepsäläinen et al. 2007. J Med Genet 44: 606-8.
Watkins-Chow and Pavan. 2008. Genome Res 18:60-6.