I am a Research Fellow at the Department of Data Science, Dana-Farber Cancer Institute and Harvard University T.H. Chan School of Public Health. I joined Dr. X. Shirley Liu's lab in July 2019 and also co-mentored by Dr. Myles Brown and Dr. Xihong Lin. I did my PhD in Bioengineering in Dr. Jun S. Song's group at University of Illinois at Urbana-Champaign working for NIH Big Data to Knowledge (BD2K) U54 center.
I have developed computational biology and statistical learning methods to decipher human genetic variants associated with cancer risk. I have also been developing machine learning methods for single-cell genomics data to understand cells in tumor.
I also enjoy working on patient genomics data to understand disease and identify therapeutic targets. Our collaborators include Dr. Catherine J. Wu at DFCI and Dr. David E. Fisher at MGH.
My research interests are to develop statistical and machine learning methods for data-driven discoveries in biomedicine.
For fun, I enjoy hiking, music and wildlife photography.
I'll join Duke University in Jan 2024 as an Assistant Professor in Department of Neurosurgery, and Department of Bioinformatics & Biostatistics.
Grad students and postdocs in the areas of computational biology, bioinformatics, cancer genomics, and machine learning - Welcome to join us!
Website: Computational Biology Lab @ Duke
Ph.D. Bioengineering, Aug. 2014 - May 2019
Bachelor of Science, Biosciences, Sep. 2010 - Jul. 2014
Previous genome-wide association studies (GWAS) have identified common genetic variants that modulate cancer susceptibility, with their causative mechanisms missing. We have developed computational methods for functional interpretation of variants associated with cancer, by combining analyses of expression quantitative trait loci (eQTL), a modified version of allele-specific expression (ASE) that systematically utilizes haplotype information, transcription factor (TF) binding preference, and epigenetic information. Databases/techniques useful in this study are: TCGA, ENCODE, GTEx, Roadmap genomics, TF motif databases, genotyping array, RNA-seq, ChIP-seq, ChIA-PET, Hi-C, etc.. Our computational framework provides an effective means to integrate GWAS results with high-throughput genomic and epigenomic data.
Functional interpretation of non-coding cancer variants has been successful, thanks to the epigenetic information in corresponding cancer cell types and matched normal tissues. However, this approach does not explore the potential effect of risk germline variants on other important cell types that constitute the microenvironment of tumor or its precursor. We show evidence that a breast cancer-associated variant may regulate a tumor-suppressing gene in tumor infiltrating lymphocytes, in particular, T lymphocytes and natural killer (NK) cells. Our hypothesis poses the possibility that cancer variants could be functional in immune cells in the tumor microenvironment, thereby modulating the immune surveillance and affecting the clearing of early cancer initiating cells. This was an idea flashed into my mind after a journal club in the Song Group. Thanks the immune cells for surveillance.
Next-generation sequencing (NGS) techniques are revolutionizing biomedical research by providing powerful methods for generating genomic and epigenomic profiles. However, a neat introductory learning resource is lacking. We have developed an interactive online educational resource called SequencEnG (acronym for Sequencing Techniques Engine for Genomics) to provide a tree-structured knowledge base of 71 different sequencing techniques and step-by-step NGS data analysis pipelines comparing popular tools. SequencEnG is part of the project KnowEnG (Knowledge Engine for Genomics). I wish I could have a resource like this when I first entered bioinformatics :D
Update 2022-10: SequencEnG has 10,985 users Worldwide!
Mathematical Statistics | Machine Learning | Computational Cancer Biology |
Stochastic Processes | Statistical Learning | Statistical Data Analysis in Physics |