research
Our lab uses statistical learning and deep learning as the unifying language to communicate across research areas in computational biology. The overarching goal of our research is to enable precision medicine using automated deep learning approaches.
Primary Research Interests:
-
Automated Deep Learning (AutoDL)-powered Interpretation of Genetic Variations
1.1 Novel AutoDL Method Development
1.2 Interpretation of Genetic Variations -
Post-transcriptional and Transcriptional Regulatory Networks
2.1 RNA Splicing and Processing
2.2 Functional and Biomarker Analysis
1. Automated Deep Learning (AutoDL)-powered Interpretation of Genetic Variations
How to build accurate models and make inferences on the ever-growing, inherently heterogeneous biomedical big data to discover novel knowledge? This key question is challenging in two folds: the fast developments of biotechnologies, as well as that of modern deep learning.
Using our AutoDL framework AMBER (Figure 1), we aim to build task-specific complex computation systems, on the basis of cutting-edge neural network architectures, to model various genetic and molecular variations, and draw reliable and interpretable inferences for knowledge discovery and actionable guidelines in biomedicine.
1.1 Novel AutoDL Method Development
To accommodate the rapid biotechnology development with modern deep learning, we aim to develop novel, efficient AutoDL methods specifically tailored for deploying artificial intelligence in medicine.
In the long term, we hope our methods will democratize deep learning and thereby accelerate scientific discoveries in biomedicine.
Related publications:
- An automated framework for efficiently designing deep convolutional neural networks in genomics
- AMBIENT: Accelerated convolutional neural network architecture search for regulatory genomics
1.2 Interpretation of Genetic Variations
The promise of precision medicine relies on a comprehensive understanding of personal genetic variations, while the enormous number of genetic variations makes it intractable for exhaustive experimental validation. Thus, we need computational tools to interpret and prioritize variant effects - critically, these predictions must be bias-free from many confounding factors, such as allele frequency, population stratification, etc.
We aim to employ AutoDL methods to explore an extremely large model space, searching for accurate and unbiased models for variant effect predictions. These automated, data-driven models can help us understand the causal effects of genetic variations in various contexts, including epigenetics, transcription, genome editing, and in tumor.
Related publications:
2. Post-transcriptional and Transcriptional Regulatory Networks
We also study the molecular mechanisms that mediate the genetic effects to phenotypical variations and disease prognostics. Importantly, each gene has two critical aspects: the steady-state abundance (measured by mRNA expression), and the information content within each mRNA molecule (e.g., mediated by RNA splicing and processing).
Enabled by high-throughput sequencing technologies, we characterize the variation of RNA expression, RNA splicing, RNA-protein interactions, and RNA modifications in health and disease by quantitative, statistical and deep learning approaches (See Figure 2 for a deep-learning augmented Bayesian statistical framework to analyze RNA splicing).
2.1 RNA Splicing and Processing
RNA splicing, and broadly, RNA processing events, can modulate gene functions in orthogonal to the gene dosage effects controlled by the transcriptional machinery.
We seek to systematically investigate RNA molecular variations (gene expression, alternative splicing, RNA modification, protein-RNA interactions, etc.), genetic variations, and their crosstalks in health and disease.
Related publications:
- Deep-learning augmented RNA-seq analysis of transcript splicing
- CLIP-seq analysis of multi-mapped reads discovers novel functional RNA regulatory sites in the human transcriptome
2.2 Functional and Biomarker Analysis
Naturally, gene transcriptional and post-transcriptional regulatory events have ubiquitous interaction effects. Current gene-level analysis is a good first-order approximation, yet is insufficient to capture the landscape of molecular variations.
We use both robust statistical learning and automated interpretable deep learning methods to understand the functional importance of RNA processing events. Subsequently, we implement them as biomarkers for disease diagnostics and prognostics.
Related publications:
- Longitudinal analysis of RNA splicing dynamics in SARS-CoV-2 infection reveals robust predictive biomarkers
- Pre-infection antiviral innate immunity contributes to sex differences in SARS-CoV-2 infection
- Cleft lip and cleft palate in Esrp1 knockout mice is associated with alterations in epithelial-mesenchymal crosstalk