DepMicroDiff: Diffusion-Based Dependency-Aware Multimodal Imputation for Microbiome Data
Sadia, Rabeya Tus, Cheng, Qiang
–arXiv.org Artificial Intelligence
--Microbiome data analysis is essential for understanding host health and disease, yet its inherent sparsity and noise pose major challenges for accurate imputation, hindering downstream tasks such as biomarker discovery. Existing imputation methods, including recent diffusion-based models, often fail to capture the complex interdependencies between microbial taxa and overlook contextual metadata that can inform imputation. We introduce DepMicroDiff, a novel framework that combines diffusion-based generative modeling with a Dependency-A ware Transformer (DA T) to explicitly capture both mutual pairwise dependencies and autoregressive relationships. DepMicroDiff is further enhanced by V AE-based pretraining across diverse cancer datasets and conditioning on patient metadata encoded via a large language model (LLM). Experiments on TCGA microbiome datasets show that DepMicroDiff substantially outperforms state-of-the-art baselines, achieving higher Pearson correlation (up to 0.712), cosine similarity (up to 0.812), and lower RMSE and MAE across multiple cancer types, demonstrating its robustness and generalizability for microbiome imputation. Microbiome data analysis plays a critical role in understanding host health, disease progression, and therapeutic response, particularly in contexts such as cancer progression, gut-brain interactions, and immunotherapy [1]. However, mi-crobiome datasets, derived from 16S rRNA or metagenomic sequencing, are notoriously sparse and noisy due to limitations in sequencing technologies, biological variability, and compositional constraints.
arXiv.org Artificial Intelligence
Aug-1-2025
- Country:
- Asia > Middle East
- Jordan (0.04)
- North America > United States
- Kentucky (0.04)
- Asia > Middle East
- Genre:
- Research Report > New Finding (0.46)
- Industry:
- Health & Medicine > Therapeutic Area > Oncology (1.00)
- Technology: