Duplications, deletions and variation, oh my! CNVs and the genomic landscape

We are all mutants in one way or another.

No, this isn’t the prelude to another X-Men movie. It’s just stating the fact that we all have variations in our genome that affect how we look, think and act. None that I know of confer laser vision or telekinetic ability, but some do allow a few people to run faster, jump higher or think more abstractly than most of us. Others cause disease or increase susceptibilities. Most, in isolation, create no significant difference.

But say you possessed a near-ideal genome sequence that stacked the odds in favor of longevity and physical and mental wellbeing. That still might not be good enough to escape genomic problems. While sequences capture most of the clinical genomics attention, scientists have learned over the past decade that something called copy number variations (CNVs) also play a role.

What are CNVs?

The simple definition of a CNV is the presence of something other than two copies of a gene in the genome. Inheritance of genes was originally thought to be quite straightforward—each healthy individual has two copies of all their genes, one inherited from mom and the other from dad. What school children learn about Mendel and his peas, such as dominant and recessive inheritance, is based on this simple arithmetic.

CNVs can contribute to disease and other health issues. Surprisingly, recent research has shown that healthy individuals also typically have relatively large sections of DNA, characterized as between 1,000 and 5,000,000 bases, added and subtracted from their genomes. Therefore, where DNA is added (duplications), they may carry three or more copies of the genes in that DNA segment. Where it’s subtracted (deletions), they may carry one or even zero copies of a gene. There are apparently hundreds of such variations within most genomes.

Copy number variations of entire chromosomes underlie well-known developmental disorders such as trisomy 21, or Down's syndrome. Trisomy 21 means that there are three copies of chromosome 21, and while not included in the CNV classification, it demonstrates the effects that can occur when additional gene copies are present on a massive scale. CNVs typically confer more subtle variation, adding up to a large number of small effects.

I’ll admit that I saw the CNV acronym in the genome literature for a while before doing more research. And even then it seemed like yet another confounding discovery that told us genome sequences are not the full story—indeed, they just begin to tell it. So like many other things—epigenetics, the microbiome, 3-D genomic structure and so on—CNVs are a vital part of genome biology.

A brief history

The unexpected abundance of CNVs was not recognized until soon after the reference human genome was completed. Charles Lee*, Ph.D., working at Harvard, was having difficulty with human genotyping in 2002. Genotype anomalies in patients were expected, but Lee discovered the healthy control group also showed confounding variability in their genomes. In particular, Lee kept finding additional copies of specific genes. This led to a research project in which he set out to measure just how common these additional copies were across the genome. Another group led by Michael Wigler, Ph.D., used different methods to investigate the phenomenon at Cold Spring Harbor Laboratory at roughly the same time. Both groups soon published papers showing that CNVs are common and occur throughout the human genome.

Since then there has been a general recognition that CNVs contribute a great deal to human genomic diversity. They may provide as much as three times the amount of base variation as the better-known single nucleotide polylmorphisms (SNPs), where single base pairs vary between individuals without affecting function. Research has also indicated that CNVs can alter gene expression and influence the regulation of genes in the vicinity.

Disease implications

Associating CNVs with disease is an ongoing field of research. It adds yet another challenging twist in the large effort to match genotype with phenotype. That is, how does our genetic material add up to produce all of our traits, including our susceptibilities to disease? Evidence is mounting that CNVs can play significant roles, and a few specific examples have been found.

Interestingly, CNVs have been found to confer protection from some infectious diseases, including HIV and malaria. Rare CNVs have also been implicated in neurological disorders such as mental retardation, autism and schizophrenia. SNPedia has a page of facts and links that, while last updated nearly two years ago, provides an interesting overview of research to that time.

Overall, however, finding associations between CNVs and disease has seldom been straightforward. While they have been implicated in many other diseases, the specific role or roles they play remains unclear. As with much of genomic research, we are getting a better idea of the general situation, but we are in the early days of understanding how it all fits together. The abstract of a recent review paper expresses it well: “However, the landscape of copy number variation still remains largely unexplored, especially for smaller CNVs and those embedded within complex regions of the human genome.”

*Dr. Lee was recently named to lead the scientific effort at The Jackson Laboratory for Genomic Medicine in Farmington, Conn., beginning Aug. 1, 2013. He joined a faculty already investigating organizational, structural and regulatory systems and variations within the human genome. Such genome-wide studies look for factors that influence genome function and disruptions that contribute to complex disease.

Figure from Nature 464, 704-712 (1 April 2010) | doi:10.1038/nature08516