Genetic component dissection for complex human diseases via integrative analysis of diverse genetic and genomic data; method development for data analysis; machine learning
I pursued my PhD study in Computer Science (specifically, statistical machine learning) from 1997-2000 at National University of Singapore. During this period of time, I not only worked in machine learning theory, but also designed and implemented a new family of online learning algorithms to recognize the handwriting digits in MNIST dataset, and obtained the state-of-art performance that could be achieved by online learning algorithms at that time.
From 2001-2003, I worked on microarray gene expression data. The methods I explored included supervised classifications via relevance vector machines and unsupervised class discovery. Since 2004, I have been working in statistical genetics -- dissecting genetic bases for complex human diseases by analysis of genome-wide linkage and association data, exome-wide sequencing data, RNA-sequencing data. Since the genetic variants far out-numbered the sample size, various statistical methods, including data imputation, regularized regression, generalized linear mixed models, pathway analysis, fine mapping and evidence combination from heterogeneous data sources, have been explored and developed.
I have published more than 40 peer-reviewed papers in various journals, including first or co-first author publications in Nature Genetics, American Journal of Human Genetics, Machine Learning, Journal of Computer and System Sciences, and IEEE Transactions on Information Theory.