Uncertainty
Functional Bilevel Optimization for Machine Learning
Bilevel optimization methods solve problems with hierarchical structures, optimizing two interdependent objectives: an inner-level objective and an outer-level one. Initially used in machine learning for model selection [Bennett et al., 2006] and sparse feature learning [Mairal et al., 2012], these methods gained popularity as efficient alternatives to grid search for hyper-parameter tuning [Feurer and Hutter,
Scaling Continuous Latent Variable Models as Probabilistic Integral Circuits Gennaro Gala 1, Cassio de Campos 1 Antonio V ergari 2, Erik Quaeghebeur
Probabilistic integral circuits (PICs) have been recently introduced as probabilistic models enjoying the key ingredient behind expressive generative models: continuous latent variables (L Vs). PICs are symbolic computational graphs defining continuous L V models as hierarchies of functions that are summed and multiplied together, or integrated over some L Vs. They are tractable if L Vs can be analytically integrated out, otherwise they can be approximated by tractable probabilistic circuits (PC) encoding a hierarchical numerical quadrature process, called QPCs. So far, only tree-shaped PICs have been explored, and training them via numerical quadrature requires memory-intensive processing at scale. In this paper, we address these issues, and present: (i) a pipeline for building DAG-shaped PICs out of arbitrary variable decompositions, (ii) a procedure for training PICs using tensorized circuit architectures, and (iii) neural functional sharing techniques to allow scalable training.
Marginal Causal Flows for Validation and Inference
Investigating the marginal causal effect of an intervention on an outcome from complex data remains challenging due to the inflexibility of employed models and the lack of complexity in causal benchmark datasets, which often fail to reproduce intricate real-world data patterns. In this paper we introduce Frugal Flows, a novel likelihood-based machine learning model that uses normalising flows to flexibly learn the data-generating process, while also directly inferring the marginal causal quantities from observational data.