Diffusion and Flow-based Copulas: Forgetting and Remembering Dependencies

Huk, David, Damoulas, Theodoros

arXiv.org Machine Learning 

Copulas are a fundamental tool for modelling multivariate dependencies in data, forming the method of choice in diverse fields and applications. However, the adoption of existing models for multimodal and high-dimensional dependencies is hindered by restrictive assumptions and poor scaling. In this work, we present methods for modelling copulas based on the principles of diffusions and flows. We design two processes that progressively forget inter-variable dependencies while leaving dimension-wise distributions unaffected, provably defining valid copulas at all times. We show how to obtain copula models by learning to remember the forgotten dependencies from each process, theoretically recovering the true copula at optimality. The first instantiation of our framework focuses on direct density estimation, while the second specialises in expedient sampling. Empirically, we demonstrate the superior performance of our proposed methods over state-of-the-art copula approaches in modelling complex and high-dimensional dependencies from scientific datasets and images. Our work enhances the representational power of copula models, empowering applications and paving the way for their adoption on larger scales and more challenging domains. Given a collection of d continuous random variables, a simple model for their joint probability density function is the product of the corresponding d univariate densities (Peterson, 1987). Indeed, the copula uniquely and exactly represents the inter-variable dependence, unlike correlation or mutual information (Geenens, 2023), fully disentangling the marginal behaviour from the joint. This disentanglement enables a modular approach for multivariate modelling: first, model the uni-variate variables independently, and second, model their dependence with a copula.