I pursued my PhD study in Computer Science (specifically, statistical machine learning) from 1997-2000 at National University of Singapore. During this period of time, I not only worked in machine learning theory, but also designed and implemented a new family of online learning algorithms to recognize the handwriting digits in MNIST dataset, and obtained the state-of-art performance that could be achieved by online learning algorithms at that time.
From 2001-2003, I worked on microarray gene expression data. The methods I explored included supervised classifications via relevance vector machines and unsupervised class discovery. Since 2004, I have been working in statistical genetics -- dissecting genetic bases for complex human diseases by analysis of genome-wide linkage and association data, exome-wide sequencing data, RNA-sequencing data. Since the genetic variants far out-numbered the sample size, various statistical methods, including data imputation, regularized regression, generalized linear mixed models, pathway analysis, fine mapping and evidence combination from heterogeneous data sources, have been explored and developed.
I have published more than 40 peer-reviewed papers in various journals, including first or co-first author publications in Nature Genetics, American Journal of Human Genetics, Machine Learning, Journal of Computer and System Sciences, and IEEE Transactions on Information Theory.
1997– 2000: National Univ. of Singapore, Singapore
Awarded degree: PhD in Computer Science. (awarded IDA gold medal and prize for being the PhD student with the best thesis in 2001)
Thesis title: “From support vector machines to large margin classifiers” (Supervisor: Prof. Phil Long)
1990– 1993: Xi’an Jiaotong Univ., P R China. Computer Science.
Awarded Degree: M. Eng.
1986– 1990; Xi’an Jiaotong Univ., P R China. Computer Science.
Awarded Degree: B. Sci. (top 5%) (admitted to the university without taking the national university entrance examinations)
Jun 2007 – now: senior computational scientist at Computational Sciences, The Jackson Laboratory, Farmington, CT, US
Jan 2007 – Jun 2017: staff scientist at Human Genetics, Genome Institute of Singapore, Singapore.
Apr 2002 – Dec 2006: Postdoc at Information & Mathematical Sciences, Genome Institute of Singapore, Singapore
Jan 2001 – Apr 2002: Research assistant at Dept. of Engineering Mathematics, Univ. of Bristol. U.K.
Large-scale haplotype association analysis, especially at the whole-genome level, is still a very challenging task without an optimal solution. In this study, we propose a new approach for haplotype association analysis that is based on a variable-sized sliding-window framework and employs regularized regression analysis to tackle the problem of multiple degrees of freedom in the haplotype test. Our method can handle a large number of haplotypes in association analyses more efficiently and effectively than do currently available approaches. We implement a procedure in which the maximum size of a sliding window is determined by local haplotype diversity and sample size, an attractive feature for large-scale haplotype analyses, such as a whole-genome scan, in which linkage disequilibrium patterns are expected to vary widely. We compare the performance of our method with that of three other methods--a test based on a single-nucleotide polymorphism, a cladistic analysis of haplotypes, and variable-length Markov chains--with use of both simulated and experimental data. By analyzing data sets simulated under different disease models, we demonstrate that our method consistently outperforms the other three methods, especially when the region under study has high haplotype diversity. Built on the regression analysis framework, our method can incorporate other risk-factor information into haplotype-based association analysis, which is becoming an increasingly necessary step for studying common disorders to which both genetic and environmental risk factors contribute.