A tool for novel Least Diverged Ortholog prediction through machine learning
Identifying functionally equivalent proteins between species is a fundamental problem in comparative genetics. While orthology does not guarantee functional equivalence, the identification of orthologs—genes in different organisms that diverged by speciation—is often the first step in approaching this problem. Many methods are available for predicting orthologs. Recent approaches combine methods and filter candidate predictions by “voting”—assigning confidence to ortholog pairs based on the number of predictions by inde- pendent methods. Although voting is a heuristic, it maintains precision while increasing recall. Here we employ machine learning to optimize voting by learning which methods make better predictions and, in essence, giving those methods more votes. We present a new tool called WORMHOLE that predicts a strict subclass of orthologs called least diverged orthologs (LDOs) with a high level of functional specificity by learning features of orthology that are encoded in the patterns of predictions made by 17 constituent methods. We validate WORMHOLE using multiple measures of evolutionary divergence and functional relatedness, including community standards provided by the Quest for Orthologs consortium. WORMHOLE’s particular strength lies in predicting LDOs between distantly related species, where orthology is difficult to identify and is of critical importance for comparative biology.
Sutphin GL, Mahoney JM, Sheppard K, Walton DO, Korstanje R (2016) WORMHOLE: Novel Least Diverged Ortholog Prediction through Machine Learning. PLoS Computational Biology 12(11):e1005182