research
Our lab develops machine learning methods for high-dimensional, sparse, heterogeneous, and multimodal genomics data. Our work is motivated by the gap that while cellular and molecular systems are highly complex and dynamic, individual experimental data often capture partial and limited snapshots of biological processes. To address this challenge, we develop network-based and deep learning approaches that integrate large-scale public datasets, including both bulk and single-cell data, along with prior biological knowledge. Our research interests lies in the following areas:
- predicting multimodal representations of cells;
- transfering experimental insights from model organisms to human contexts;
- identifying genes and pathways underlying human diseases and sex differences.
multimodal machine learning for biology
Cellular functions are coordinated through dynamic interactions among diverse molecular entities, including genes, proteins, and regulatory DNA elements. However, experimental technologies typically capture only partial, static snapshots of this complex regulatory landscape. Our lab develops deep learning methods that leverage large-scale public genomic datasets to reconstruct multimodal cellular profiles and infer the dynamic relationships between data modalities, enabling a more comprehensive understanding of gene regulation and cellular state.

Selected publications:
- ”Semi-supervised single-cell cross-modality translation using Polarbear” International Conference on Research in Computational Molecular Biology (2022)
- ”Multimodal Single- Cell Translation and Alignment with Semi-Supervised Learning” Journal of Computational Biology (2022)
- ”Multicondition and multimodality single-cell temporal profile inference during mouse embryonic development” Genome Research (2025, in press)
knowledge transfer across species
Model organisms, such as mouse, have been widely used to uncover molecular mechanisms relevant to human biology, particularly in contexts where human samples are scarce or inaccessible (e.g., the brain). However, evolutionary divergence between species poses significant challenges for direct knowledge transfer, often limiting the translational value of findings. Our lab develops network-based and deep learning approaches to project and contextualize large-scale genomics data from model organisms, with the goal of improving our understanding of human health and disease.


Selected publications:
- ”Identifying genes and pathways linking astrocyte regional specificity to Alzheimer’s disease susceptibility” bioRxiv
- ”Cross-species imputation and comparison of single-cell transcriptomic profiles” Genome Biology (2025)
network-based characterization of human diseases
Complex diseases emerge from the combined effects of many genetic risk factors, environmental exposures, and their context-dependent interactions across different biological systems. Our lab develops network-based machine learning approaches to mine large-scale, high-dimensional genomics data in public databases, to identify genetic contributors to disease and model their functional impact in tissue-specific, cell-type–specific, and single-cell contexts.

Selected publications: