DIVIDE: A Framework for Learning from Independent Multi-Mechanism Data Using Deep Encoders and Gaussian Processes

Chawla, Vivek, Slautin, Boris, Pratiush, Utkarsh, Penumadu, Dayakar, Kalinin, Sergei

arXiv.org Artificial Intelligence 

ABSTRACT Scientific datasets often arise from multiple independent mechanisms such as spati al, categorical or structural effects, whose combined influence obscures their individual contributions. We introduce DIVIDE, a framework that disentangles these influences by integrating mechanism - specific deep encoders with a structured Gaussian Process in a joint latent space. Disentanglement here refers to separating independently acting generative factors . The encoders isolate distinct mechanisms while the Gaussian Process captures their combined effect with calibrated uncertainty. The architecture supports structured priors, enabling interpretable and mechanism - aware prediction as well as efficient active l earning. Across benc hmarks, DIVIDE separates mechanisms, reproduces additive and scaled interactions, and remains robust under noise. The framework extends naturally to multifunctional datasets where mechanical, electromagnetic or optical responses coexist. INTRODUCTION Many real - world systems exhibit behavior driven by the combined influence of multiple independent mechanisms. These mechanisms may represent categorical factors, spatial dependencies, or nonlinear physical responses. While the scalar output of such systems is observable, the individual contributions of these mechanisms are often unknown and unmeasured. Modeling this type of data requires not only accurate predictions but also the ability to attribute variation in the output to specific, distinct sources. In thi s context, we use disentanglement to mean recovering those independently acting generative factors from observational data. Disentangling these contributions is particularly important in scientific and engineering domains where interpretability, causality, and mechanism - aware reasoning are essential. Partial solutions to this challenge have emerged from the field of disentangled representation learning, which seeks to identify independent factors of variation from high - dimensional data.