Forré, Patrick
Improving Fair Predictions Using Variational Inference In Causal Models
Helwegen, Rik, Louizos, Christos, Forré, Patrick
The importance of algorithmic fairness grows with the increasing impact machine learning has on people's lives. Recent work on fairness metrics shows the need for causal reasoning in fairness constraints. In this work, a practical method named FairTrade is proposed for creating flexible prediction models which integrate fairness constraints on sensitive causal paths. The method uses recent advances in variational inference in order to account for unobserved confounders. Further, a method outline is proposed which uses the causal mechanism estimates to audit black box models. Experiments are conducted on simulated data and on a real dataset in the context of detecting unlawful social welfare. This research aims to contribute to machine learning techniques which honour our ethical and legal boundaries.
Pruning via Iterative Ranking of Sensitivity Statistics
Verdenius, Stijn, Stol, Maarten, Forré, Patrick
With the introduction of SNIP [arXiv:1810.02340v2], it has been demonstrated that modern neural networks can effectively be pruned before training. Yet, its sensitivity criterion has since been criticized for not propagating training signal properly or even disconnecting layers. As a remedy, GraSP [arXiv:2002.07376v1] was introduced, compromising on simplicity. However, in this work we show that by applying the sensitivity criterion iteratively in smaller steps - still before training - we can improve its performance without difficult implementation. As such, we introduce 'SNIP-it'. We then demonstrate how it can be applied for both structured and unstructured pruning, before and/or during training, therewith achieving state-of-the-art sparsity-performance trade-offs. That is, while already providing the computational benefits of pruning in the training process from the start. Furthermore, we evaluate our methods on robustness to overfitting, disconnection and adversarial attacks as well.
Neural Ordinary Differential Equations on Manifolds
Falorsi, Luca, Forré, Patrick
Normalizing flows are a powerful technique for obtaining reparameterizable samples from complex multimodal distributions. Unfortunately current approaches fall short when the underlying space has a non trivial topology, and are only available for the most basic geometries. Recently normalizing flows in Euclidean space based on Neural ODEs show great promise, yet suffer the same limitations. Using ideas from differential geometry and geometric control theory, we describe how neural ODEs can be extended to smooth manifolds. We show how vector fields provide a general framework for parameterizing a flexible class of invertible mapping on these spaces and we illustrate how gradient based learning can be performed. As a result we define a general methodology for building normalizing flows on manifolds.
Designing Data Augmentation for Simulating Interventions
Ilse, Maximilian, Tomczak, Jakub M., Forré, Patrick
Machine learning models trained with purely observational data and the principle of empirical risk minimization (Vapnik, 1992) can fail to generalize to unseen domains. In this paper, we focus on the case where the problem arises through spurious correlation between the observed domains and the actual task labels. We find that many domain generalization methods do not explicitly take this spurious correlation into account. Instead, especially in more application-oriented research areas like medical imaging or robotics, data augmentation techniques that are based on heuristics are used to learn domain invariant features. To bridge the gap between theory and practice, we develop a causal perspective on the problem of domain generalization. We argue that causal concepts can be used to explain the success of data augmentation by describing how they can weaken the spurious correlation between the observed domains and the task labels. We demonstrate that data augmentation can serve as a tool for simulating interventional data. Lastly, but unsurprisingly, we show that augmenting data improperly can cause a significant decrease in performance.
Reparameterizing Distributions on Lie Groups
Falorsi, Luca, de Haan, Pim, Davidson, Tim R., Forré, Patrick
Reparameterizable densities are an important way to learn probability distributions in a deep learning setting. For many distributions it is possible to create low-variance gradient estimators by utilizing a `reparameterization trick'. Due to the absence of a general reparameterization trick, much research has recently been devoted to extend the number of reparameterizable distributional families. Unfortunately, this research has primarily focused on distributions defined in Euclidean space, ruling out the usage of one of the most influential class of spaces with non-trivial topologies: Lie groups. In this work we define a general framework to create reparameterizable densities on arbitrary Lie groups, and provide a detailed practitioners guide to further the ease of usage. We demonstrate how to create complex and multimodal distributions on the well known oriented group of 3D rotations, $\operatorname{SO}(3)$, using normalizing flows. Our experiments on applying such distributions in a Bayesian setting for pose estimation on objects with discrete and continuous symmetries, showcase their necessity in achieving realistic uncertainty estimates.
Causal Calculus in the Presence of Cycles, Latent Confounders and Selection Bias
Forré, Patrick, Mooij, Joris M.
We prove the main rules of causal calculus (also called do-calculus) for interventional structural causal models (iSCMs), a generalization of a recently proposed general class of non-/linear structural causal models that allow for cycles, latent confounders and arbitrary probability distributions. We also generalize adjustment criteria and formulas from the acyclic setting to the general one (i.e. iSCMs). Such criteria then allow to estimate (conditional) causal effects from observational data that was (partially) gathered under selection bias and cycles. This generalizes the backdoor criterion, the selection-backdoor criterion and extensions of these to arbitrary iSCMs. Together, our results thus enable causal reasoning in the presence of cycles, latent confounders and selection bias.
Sinkhorn AutoEncoders
Patrini, Giorgio, Carioni, Marcello, Forré, Patrick, Bhargav, Samarth, Welling, Max, Berg, Rianne van den, Genewein, Tim, Nielsen, Frank
Optimal Transport offers an alternative to maximum likelihood for learning generative autoencoding models. We show how this principle dictates the minimization of the Wasserstein distance between the encoder aggregated posterior and the prior, plus a reconstruction error. We prove that in the non-parametric limit the autoencoder generates the data distribution if and only if the two distributions match exactly, and that the optimum can be obtained by deterministic autoencoders. We then introduce the Sinkhorn AutoEncoder (SAE), which casts the problem into Optimal Transport on the latent space. The resulting Wasserstein distance is minimized by backpropagating through the Sinkhorn algorithm. SAE models the aggregated posterior as an implicit distribution and therefore does not need a reparameterization trick for gradients estimation. Moreover, it requires virtually no adaptation to different prior distributions. We demonstrate its flexibility by considering models with hyperspherical and Dirichlet priors, as well as a simple case of probabilistic programming. SAE matches or outperforms other autoencoding models in visual quality and FID scores.
Explorations in Homeomorphic Variational Auto-Encoding
Falorsi, Luca, de Haan, Pim, Davidson, Tim R., De Cao, Nicola, Weiler, Maurice, Forré, Patrick, Cohen, Taco S.
The manifold hypothesis states that many kinds of high-dimensional data are concentrated near a low-dimensional manifold. If the topology of this data manifold is non-trivial, a continuous encoder network cannot embed it in a one-to-one manner without creating holes of low density in the latent space. This is at odds with the Gaussian prior assumption typically made in Variational Auto-Encoders (VAEs), because the density of a Gaussian concentrates near a blob-like manifold. In this paper we investigate the use of manifold-valued latent variables. Specifically, we focus on the important case of continuously differentiable symmetry groups (Lie groups), such as the group of 3D rotations $\operatorname{SO}(3)$. We show how a VAE with $\operatorname{SO}(3)$-valued latent variables can be constructed, by extending the reparameterization trick to compact connected Lie groups. Our experiments show that choosing manifold-valued latent variables that match the topology of the latent data manifold, is crucial to preserve the topological structure and learn a well-behaved latent space.
Constraint-based Causal Discovery for Non-Linear Structural Causal Models with Cycles and Latent Confounders
Forré, Patrick, Mooij, Joris M.
We address the problem of causal discovery from data, making use of the recently proposed causal modeling framework of modular structural causal models (mSCM) to handle cycles, latent confounders and non-linearities. We introduce {\sigma}-connection graphs ({\sigma}-CG), a new class of mixed graphs (containing undirected, bidirected and directed edges) with additional structure, and extend the concept of {\sigma}-separation, the appropriate generalization of the well-known notion of d-separation in this setting, to apply to {\sigma}-CGs. We prove the closedness of {\sigma}-separation under marginalisation and conditioning and exploit this to implement a test of {\sigma}-separation on a {\sigma}-CG. This then leads us to the first causal discovery algorithm that can handle non-linear functional relations, latent confounders, cyclic causal relationships, and data from different (stochastic) perfect interventions. As a proof of concept, we show on synthetic data how well the algorithm recovers features of the causal graph of modular structural causal models.
Markov Properties for Graphical Models with Cycles and Latent Variables
Forré, Patrick, Mooij, Joris M.
We investigate probabilistic graphical models that allow for both cycles and latent variables. For this we introduce directed graphs with hyperedges (HEDGes), generalizing and combining both marginalized directed acyclic graphs (mDAGs) that can model latent (dependent) variables, and directed mixed graphs (DMGs) that can model cycles. We define and analyse several different Markov properties that relate the graphical structure of a HEDG with a probability distribution on a corresponding product space over the set of nodes, for example factorization properties, structural equations properties, ordered/local/global Markov properties, and marginal versions of these. The various Markov properties for HEDGes are in general not equivalent to each other when cycles or hyperedges are present, in contrast with the simpler case of directed acyclic graphical (DAG) models (also known as Bayesian networks). We show how the Markov properties for HEDGes - and thus the corresponding graphical Markov models - are logically related to each other.