research

Our lab develops machine learning methods for high-dimensional, sparse, heterogeneous, and multimodal genomics data. Our work is motivated by the gap that while cellular and molecular systems are highly complex and dynamic, individual experimental data often capture partial and limited snapshots of biological processes. To address this challenge, we develop network-based and deep learning approaches that integrate large-scale public datasets, including both bulk and single-cell data, along with prior biological knowledge. Our research interests lies in the following areas:

  • predicting multimodal representations of cells;
  • transfering experimental insights from model organisms to human contexts;
  • identifying genes and pathways underlying human diseases and sex differences.


multimodal machine learning for biology

Cellular functions are coordinated through dynamic interactions among diverse molecular entities, including genes, proteins, and regulatory DNA elements. However, experimental technologies typically capture only partial, static snapshots of this complex regulatory landscape. Our lab develops deep learning methods that leverage large-scale public genomic datasets to reconstruct multimodal cellular profiles and infer the dynamic relationships between data modalities, enabling a more comprehensive understanding of gene regulation and cellular state.

Selected publications:


knowledge transfer across species

Model organisms, such as mouse, have been widely used to uncover molecular mechanisms relevant to human biology, particularly in contexts where human samples are scarce or inaccessible (e.g., the brain). However, evolutionary divergence between species poses significant challenges for direct knowledge transfer, often limiting the translational value of findings. Our lab develops network-based and deep learning approaches to project and contextualize large-scale genomics data from model organisms, with the goal of improving our understanding of human health and disease.

Selected publications:


network-based characterization of human diseases

Complex diseases emerge from the combined effects of many genetic risk factors, environmental exposures, and their context-dependent interactions across different biological systems. Our lab develops network-based machine learning approaches to mine large-scale, high-dimensional genomics data in public databases, to identify genetic contributors to disease and model their functional impact in tissue-specific, cell-type–specific, and single-cell contexts.

Selected publications: