Integrating genomic data to identifying candidate biomarkers and building phenotypic predictive models for cancer studies.
With advances in technologies, huge amounts of multiple types of high-throughput genomics data are available. Large consortiums, such as The Cancer Genome Atlas (TCGA) and International Cancer Genome Consortium (ICGC), have generated several high throughput genomics data types for hundreds of sample on tens of cancer types, which are publicly available.These data have tremendous potential to identify new and clinically valuable biomarkers to guide the diagnosis, assessment of prognosis, and treatment of complex diseases, such as cancer. Integrating, analyzing, and interpreting big and noisy genomics data to obtain biologically meaningful results, however, remains highly challenging. Developing new computational methods based on statistical machine learning and integrative analysis of genomics data is required to facilitate the identification of a short list of biologically meaningful candidate drivers of anti-cancer drug resistance from an enormous amount of heterogeneous data.
Developing genomic aberration detection methods using next generation sequencing data.
Structural variations (SVs), which include copy number variations (CNVs), translocations and inversions of segments of genome, have gained considerable interest as types of genomic/genetic variation that play an important role in phenotypic diversity, evolution and disease susceptibility in humans and other organisms. With advances in sequencing technologies, high-throughput next generation sequencing (NGS) data are available that have created an opportunity for detecting CNVs and other SVs more accurately and thus investigating their roles more thoroughly. However, high levels of noise and biases, data heterogeneity, and the "big data" nature of NGS data present limitations in accurately identifying CNVs using current CNV detection tools. We are working on developing novel CNV detection and visualization methods based on statistical signal processing techniques to identify CNVs more accurately and efficiently.
Developing computational methods for analyzing single-cell sequencing data.
Single-cell sequencing has emerged as a new technique which provides a higher resolution of cellular differences and a better understanding of the function of an individual cell. It applies to several fields such as cancer biology and neurobiology. Even though the structure and format of the data obtained from single-cell sequencing are identical to those from a regular (bulk) sequencing, single-cell sequencing data have introduced new challenges in data analysis, including an abundance of zeros (both biological and technical), increased variability, increased number of samples (thousands to millions of cells) and increased biases. Novel statistical and computational methods are required to address these new challenges. Our current effort is on developing signal processing and machine learning methods for differential gene expression analysis, cell clustering, zero imputation and cell classification.
Detection of breast cancer using digital mammograms.
Early diagnosis is a fundamental requirement in order to prevent the increase of deaths caused by breast cancer. Breast cancer screening with mammography has significantly reduced the mortality rates through early detection of the disease. However the complexity of mammograms and the high volume of exams per radiologist have resulted in a significant number of errors. Computer-aided diagnosis (CAD) systems have been very effective to assist radiologist to detect true cancer. However it has been shown that current CAD methods do not improve diagnostic accuracy of mammography. Advances in computational systems, development of new image processing techniques and the availability of digital mammograms open an opportunity for more accurate and more efficient CAD systems. We are working on developing novel CAD methods using deep learning approaches and employing prior mammograms for both 2D mammograms (full-field digital mammography (FFDM)) and 3D mammograms (digital breast tomosynthesis (DBT)).