research

Our lab uses statistical learning and deep learning as the unifying language to communicate across research areas in computational biology. The overarching goal of our research is to enable precision medicine using automated deep learning approaches.

Primary Research Interests:

  1. Automated Deep Learning (AutoDL)-powered Interpretation of Genetic Variations
    1.1 Novel AutoDL Method Development
    1.2 Interpretation of Genetic Variations
  2. Post-transcriptional and Transcriptional Regulatory Networks
    2.1 RNA Splicing and Processing
    2.2 Functional and Biomarker Analysis

1. Automated Deep Learning (AutoDL)-powered Interpretation of Genetic Variations

How to build accurate models and make inferences on the ever-growing, inherently heterogeneous biomedical big data to discover novel knowledge? This key question is challenging in two folds: the fast developments of biotechnologies, as well as that of modern deep learning.

Using our AutoDL framework AMBER (Figure 1), we aim to build task-specific complex computation systems, on the basis of cutting-edge neural network architectures, to model various genetic and molecular variations, and draw reliable and interpretable inferences for knowledge discovery and actionable guidelines in biomedicine.

Figure 1. Illustration of AMBER workflow. AMBER uses a compendium of training data to design deep learning models in functional genomics. AMBER designs network architecture by searching for optimal combinations of computational operations (blue box) and residual connections (pink box) for each layer, to construct a child model. Taking the optimal architecture, AMBER performs downstream functional analyses.

1.1 Novel AutoDL Method Development

To accommodate the rapid biotechnology development with modern deep learning, we aim to develop novel, efficient AutoDL methods specifically tailored for deploying artificial intelligence in medicine.

In the long term, we hope our methods will democratize deep learning and thereby accelerate scientific discoveries in biomedicine.

Related publications:

1.2 Interpretation of Genetic Variations

The promise of precision medicine relies on a comprehensive understanding of personal genetic variations, while the enormous number of genetic variations makes it intractable for exhaustive experimental validation. Thus, we need computational tools to interpret and prioritize variant effects - critically, these predictions must be bias-free from many confounding factors, such as allele frequency, population stratification, etc.

We aim to employ AutoDL methods to explore an extremely large model space, searching for accurate and unbiased models for variant effect predictions. These automated, data-driven models can help us understand the causal effects of genetic variations in various contexts, including epigenetics, transcription, genome editing, and in tumor.

Related publications:

Back to Top


2. Post-transcriptional and Transcriptional Regulatory Networks

We also study the molecular mechanisms that mediate the genetic effects to phenotypical variations and disease prognostics. Importantly, each gene has two critical aspects: the steady-state abundance (measured by mRNA expression), and the information content within each mRNA molecule (e.g., mediated by RNA splicing and processing).

Enabled by high-throughput sequencing technologies, we characterize the variation of RNA expression, RNA splicing, RNA-protein interactions, and RNA modifications in health and disease by quantitative, statistical and deep learning approaches (See Figure 2 for a deep-learning augmented Bayesian statistical framework to analyze RNA splicing).

Figure 2. RNA alternative splicing analysis by DARTS DNN and BHT. DARTS consists of two core components: a deep neural network (DNN) model that predicts differential alternative splicing between two conditions on the basis of exon-specific sequence features and sample-specific regulatory features, and a Bayesian hypothesis testing (BHT) statistical model that infers differential splicing by integrating empirical evidence in RNA-seq datasets with prior probabilities.

2.1 RNA Splicing and Processing

RNA splicing, and broadly, RNA processing events, can modulate gene functions in orthogonal to the gene dosage effects controlled by the transcriptional machinery.

We seek to systematically investigate RNA molecular variations (gene expression, alternative splicing, RNA modification, protein-RNA interactions, etc.), genetic variations, and their crosstalks in health and disease.

Related publications:

2.2 Functional and Biomarker Analysis

Naturally, gene transcriptional and post-transcriptional regulatory events have ubiquitous interaction effects. Current gene-level analysis is a good first-order approximation, yet is insufficient to capture the landscape of molecular variations.

We use both robust statistical learning and automated interpretable deep learning methods to understand the functional importance of RNA processing events. Subsequently, we implement them as biomarkers for disease diagnostics and prognostics.

Related publications:

Back to Top