During my Ph.D., I developed extensive experience in NGS data analysis. I focused on implementing computational and statistical methods to handle NGS data, in particular, ChIP-seq data. I developed methods for peak calling and differential peak calling for ChIP-seq data coming from cancer samples. Also, I developed an integrated database for ChIP-seq derived human enhancers. In addition, I have contributed to several other projects including FAMTOM5 short non-coding RNA bioinformatics analysis where I worked on analyzing CAGE, RNA-seq and ChIP-seq data.
Cancer cells are often characterized by epigenetic changes, which include aberrant histone modifications. In particular, local or regional epigenetic silencing is a common mechanism in cancer for silencing expression of tumor suppressor genes. Though several tools have been created to enable detection of histone marks in ChIP-seq data from normal samples, it is unclear whether these tools can be efficiently applied to ChIP-seq data generated from cancer samples. Indeed, cancer genomes are often characterized by frequent copy number alterations: gains and losses of large regions of chromosomal material. Copy number alterations may create a substantial statistical bias in the evaluation of histone mark signal enrichment and result in underdetection of the signal in the regions of loss and overdetection of the signal in the regions of gain.