Goto

Collaborating Authors

 vi-dp-dag


DAG Learning on the Permutahedron

arXiv.org Artificial Intelligence

We propose a continuous optimization framework for discovering a latent directed acyclic graph (DAG) from observational data. Our approach optimizes over the polytope of permutation vectors, the so-called Permutahedron, to learn a topological ordering. Edges can be optimized jointly, or learned conditional on the ordering via a non-differentiable subroutine. Compared to existing continuous optimization approaches our formulation has a number of advantages including: 1. validity: optimizes over exact DAGs as opposed to other relaxations optimizing approximate DAGs; 2. modularity: accommodates any edge-optimization procedure, edge structural parameterization, and optimization loss; 3. end-to-end: either alternately iterates between node-ordering and edge-optimization, or optimizes them jointly. We demonstrate, on real-world data problems in protein-signaling and transcriptional network discovery, that our approach lies on the Pareto frontier of two key metrics, the SID and SHD. In many domains, including cell biology (Sachs et al., 2005), finance (Sanford & Moosa, 2012), and genetics (Zhang et al., 2013), the data generating process is thought to be represented by an underlying directed acylic graph (DAG). Many models rely on DAG assumptions, e.g., causal modeling uses DAGs to model distribution shifts, ensure predictor fairness among subpopulations, or learn agents more sample-efficiently (Kaddour et al., 2022). A key question, with implications ranging from better modeling to causal discovery, is how to recover this unknown DAG from observed data alone. Learning DAGs from observational data alone is fundamentally difficult for two reasons. This riddles the search space with local minima; (ii) Computation: DAG discovery is a costly combinatorial optimization problem over an exponentially large solution space and subject to global acyclicity constraints. To address issue (ii), recent work has proposed continuous relaxations of the DAG learning problem.


Differentiable DAG Sampling

arXiv.org Machine Learning

We propose a new differentiable probabilistic model over DAGs (DP-DAG). DP-DAG allows fast and differentiable DAG sampling suited to continuous optimization. To this end, DP-DAG samples a DAG by successively (1) sampling a linear ordering of the node and (2) sampling edges consistent with the sampled linear ordering. We further propose VI-DP-DAG, a new method for DAG learning from observational data which combines DP-DAG with variational inference. Hence, VI-DP-DAG approximates the posterior probability over DAG edges given the observed data. VI-DP-DAG is guaranteed to output a valid DAG at any time during training and does not require any complex augmented Lagrangian optimization scheme in contrast to existing differentiable DAG learning approaches. In our extensive experiments, we compare VI-DP-DAG to other differentiable DAG learning baselines on synthetic and real datasets. VI-DP-DAG significantly improves DAG structure and causal mechanism learning while training faster than competitors. Directed Acyclic Graphs (DAGs) are important mathematical objects in many machine learning tasks. For example, a direct application of DAGs is to represent causal relationships in a system of variables. In this case, variables are represented as nodes and causal relationships are represented as directed edges. Hence, DAG learning has found many applications for causal discovery in biology, economics or planning (Pearl, 1988; Ramsey et al., 2017; Sachs et al., 2005; Zhang et al., 2013). However, DAG learning is a challenging problem for two reasons. First, while DAG learning with data from randomized and controlled experiments is the gold-standard for causal discovery, experimental data might be hard or unethical to obtain in practice.