Why the Y?

Falling colored blocks into an "X" and "Y" pattern, representing genetic sequencing.

Pille Hallast, Ph.D., and collaborators are the first to sequence, not one, but multiple diverse human Y chromosomes in their entirety. Her work reveals the complexities of the male sex chromosome and creates a foundation for future paternal lineage and male-linked disease studies.

Background check

Pille Hallast, Ph.D., is a detective, but not in the traditional sense. As an associate research scientist in the Charles Lee Lab, her research focuses on deciphering the clues hidden within our genome. Hallast identifies and traces genetic variation in the genome from generation to generation. Her work reveals how these inherited variants can be used to map migration patterns, pinpoint the timeline for historical events, and determine ancestry, in addition to studying human health. In particular, Hallast has spent many years deciphering the mysteries of the Y chromosome, the sex-determining chromosome for biological males.

“The Y chromosome is a useful genetic marker that can be traced through generations,” says Hallast. “It is directly inherited from father to son with very little or no change and, therefore, is a useful tool for understanding human male population history.”

The Y chromosome acts as a genetic fingerprint passed down through the paternal lineage. It has stretches of code that are almost never recombined or swapped out with other genetic material. This creates a signature for the men of a specific family or line. But over great expanses of time, natural alterations or mutations to this genetic code occur, making the fingerprint even more distinct. From there, researchers like Hallast can build family trees, or phylogenies, describing the evolutionary history and biological relationships between its male members.

“As mutations occur in the Y chromosome, they will appear in the male descendants’ genome. If a specific mutation occurred in your grandfather’s Y chromosome, all his male descendants will carry this mutation because it is highly unlikely that it will disappear within that short of a timeframe,” says Hallast. “Based on these informative sites, we can generate a highly robust phylogeny to determine the relationships between chromosomes and even estimate when the mutation occurred.”

One of Hallast’s earlier studies used the genetic variation of the Y chromosome to investigate the European population history, and was one of the first to report a major replacement and expansion of specific male lineages during the Bronze Age. Her work showed how mutations tracked through paternal lineages correlated with the rapid and widespread population changes of the era and was even cited in the mystery science fiction series, X-Files Season 10 Episode 2 “Founder’s Mutation”.

“My family was so proud. While none of them are into genetics, they love that my name has been mentioned on X-Files. For them, it has so far been the greatest moment of my career,” says Hallast.

Missing in (in)action

Hallast’s expertise in studying the Y chromosome has been proven yet again in her most recent research effort, "Assembly of 43 human Y chromosomes reveals extensive complexity and variation," published in Nature. Hallast, Charles Lee, Ph.D., FACMG, and collaborators from Clemson University, Heinrich Heine University (Germany) and more, are the first to fully piece together complete sequences from multiple diverse Y chromosomes, including two of the oldest currently known human Y lineages originating from Africa. Both by generating and using other open-access data and modern long read sequencing technologies, Hallast and collaborators used 43 samples from the 1,000 Genomes Project dataset to sequence and compare Y chromosomes base by base. But hasn’t the human genome already been sequenced?

It is no secret that, in 2003, the Human Genome Project completed the first full sequencing (building base by base) of the human genome. While a great feat for modern science, the project did not complete the genome in its entirety. Roughly 92% of the human genome was sequenced, while the remaining 8% was, in effect, swept under the rug.

The Y chromosome is much smaller than its complement sex chromosome, the X, and most of the 22 other chromosomes. Since half or more (depending on the individual) of the Y is composed of highly repetitive and other complex regions, it was impossible to define using the technology available at the time, mainly Sanger sequencing. Using Sanger sequencing, Skaletsky and colleagues in 2003 reported roughly half of the human Y chromosome sequence.

“Despite being one of the smallest chromosomes in the in human genome, the Y chromosome is very difficult to study because its base components are highly repetitive. The composition is quite complex as there are large regions, covering up to tens of millions of base pairs, which are highly similar on sequence level and therefore very challenging to accurately disentangle,” says Hallast

Using long read sequencing methods, the Telomere-to-Telomere (T2T) consortium has now published the first complete human Y chromosome assembly from a single individual of European descent which is highlighted alongside Hallast’s work in Nature (Rhie et al Nature). This data still does not paint a completely accurate picture, however. Hallast’s research utilizes samples from Africa, Europe, East Asia and South Asia to create a diverse launching point for studying inheritance and health implications associated with the Y chromosome.

Under surveillance

Hallast found a huge amount of complexity in the 43 Y chromosomes sequenced and assembled.  The authors observed an extremely high level of variation in Y chromosome sizes (from 45.2 to 84.5 million base pairs) and the degree of structural variation across the Y chromosomes. The study revealed that approximately half of the euchromatin (the gene-rich region) carries large recurrent inversions or segments that have switched their orientation. These flip-flopped segments were found to differ in orientation even between relatively closely related Y chromosomes, occurring at a rate much higher than elsewhere in the genome.

“What do these inversions cause? We don’t know, but now we know they exist and we can now start to investigate their potential functional implications,” says Hallast. “It changes the perspective of the Y chromosome…. We have viewed the genome as something relatively stable, but, with the advances of sequencing technologies and genome assembly tools, we are finally able to explore the true full extent of genetic variation, how it evolves over time and what the functional consequences might be.”

Hallast intends to continue to unravel the mysteries of the Y chromosome and its implications for human health, specifically diseases where men are at higher risk. Perhaps, in the near future, the Y chromosome will reveal why men are more prone to certain diseases, infections or cancer. Interestingly, recent studies have already shown that the loss of the entire Y chromosome has been linked to aggressive features in colorectal and bladder cancers. Now, thanks to Hallast’s work, researchers are able, for example, to explore what specific changes in the Y chromosome impact the immune system in men with certain cancer types. In addition, it may soon be revealed what parts of the Y hold significant genes or biomarkers linked to certain allergies, depression, weight loss or weight gain? Hallast is excited to finally bring data from the whole genome to light.

“It is about time that the Y chromosome is no longer ignored in these important ‘whole-genome’ association studies.”