chickering
Leveraging heterogeneity for identifiability: Bayesian order-based learning of multiple DAGs
Chang, Hyunwoong, Taskin, Fariha
We propose a joint order-based scoring framework for causal structure learning of directed acyclic graph (DAG) models under heterogeneous data settings. We show that leveraging heterogeneity improves the accuracy of causal ordering estimation. In the most favorable case, the causal ordering is identifiable up to two permutations. Building on this framework, we propose an order-based Bayesian method for Gaussian DAG models and establish its theoretical properties in the high-dimensional regime. For posterior inference over the space of orderings, we introduce a random-to-random (R2R) proposal neighborhood for the Metropolis-Hastings algorithm, which is theoretically motivated and exhibits efficient mixing behavior. Simulation studies confirm the strong empirical performance of the proposed method, and an application to single-nucleus RNA sequencing data from major depressive disorder demonstrates practical utility.
Discrete Causal Representation Learning
Zhang, Wenjin, Wang, Yixin, Gu, Yuqi
Causal representation learning seeks to uncover causal relationships among high-level latent variables from low-level, entangled, and noisy observations. Existing approaches often either rely on deep neural networks, which lack interpretability and formal guarantees, or impose restrictive assumptions like linearity, continuous-only observations, and strong structural priors. These limitations particularly challenge applications with a large number of discrete latent variables and mixed-type observations. To address these challenges, we propose discrete causal representation learning (DCRL), a generative framework that models a directed acyclic graph among discrete latent variables, along with a sparse bipartite graph linking latent and observed layers. This design accommodates continuous, count, and binary responses through flexible measurement models while maintaining interpretability. Under mild conditions, we prove that both the bipartite measurement graph and the latent causal graph are identifiable from the observed data distribution alone. We further propose a three-stage estimate-resample-discovery pipeline: penalized estimation of the generative model parameters, resampling of latent configurations from the fitted model, and score-based causal discovery on the resampled latents. We establish the consistency of this procedure, ensuring reliable recovery of the latent causal structure. Empirical studies on educational assessment and synthetic image data demonstrate that DCRL recovers sparse and interpretable latent causal structures.
Extremely Greedy Equivalence Search
The goal of causal discovery is to learn a directed acyclic graph from data. One of the most well-known methods for this problem is Greedy Equivalence Search (GES). GES searches for the graph by incrementally and greedily adding or removing edges to maximize a model selection criterion. It has strong theoretical guarantees on infinite data but can fail in practice on finite data. In this paper, we first identify some of the causes of GES's failure, finding that it can get blocked in local optima, especially in denser graphs. We then propose eXtremely Greedy Equivalent Search (XGES), which involves a new heuristic to improve the search strategy of GES while retaining its theoretical guarantees. In particular, XGES favors deleting edges early in the search over inserting edges, which reduces the possibility of the search ending in local optima. A further contribution of this work is an efficient algorithmic formulation of XGES (and GES). We benchmark XGES on simulated datasets with known ground truth. We find that XGES consistently outperforms GES in recovering the correct graphs, and it is 10 times faster. XGES implementations in Python and C++ are available at https://github.com/ANazaret/XGES.
Greedy equivalence search for nonparametric graphical models
One of the hallmark achievements of the theory of graphical models and Bayesian model selection is the celebrated greedy equivalence search (GES) algorithm due to Chickering and Meek. GES is known to consistently estimate the structure of directed acyclic graph (DAG) models in various special cases including Gaussian and discrete models, which are in particular curved exponential families. A general theory that covers general nonparametric DAG models, however, is missing. Here, we establish the consistency of greedy equivalence search for general families of DAG models that satisfy smoothness conditions on the Markov factorization, and hence may not be curved exponential families, or even parametric. The proof leverages recent advances in nonparametric Bayes to construct a test for comparing misspecified DAG models that avoids arguments based on the Laplace approximation. Nonetheless, when the Laplace approximation is valid and a consistent scoring function exists, we recover the classical result. As a result, we obtain a general consistency theorem for GES applied to general DAG models.
Membership Testing in Markov Equivalence Classes via Independence Query Oracles
Zhang, Jiaqi, Shiragur, Kirankumar, Uhler, Caroline
Understanding causal relationships between variables is a fundamental problem with broad impact in numerous scientific fields. While extensive research has been dedicated to learning causal graphs from data, its complementary concept of testing causal relationships has remained largely unexplored. While learning involves the task of recovering the Markov equivalence class (MEC) of the underlying causal graph from observational data, the testing counterpart addresses the following critical question: Given a specific MEC and observational data from some causal graph, can we determine if the data-generating causal graph belongs to the given MEC? We explore constraint-based testing methods by establishing bounds on the required number of conditional independence tests. Our bounds are in terms of the size of the maximum undirected clique ($s$) of the given MEC. In the worst case, we show a lower bound of $\exp(\Omega(s))$ independence tests. We then give an algorithm that resolves the task with $\exp(O(s))$ tests, matching our lower bound. Compared to the learning problem, where algorithms often use a number of independence tests that is exponential in the maximum in-degree, this shows that testing is relatively easier. In particular, it requires exponentially less independence tests in graphs featuring high in-degrees and small clique sizes. Additionally, using the DAG associahedron, we provide a geometric interpretation of testing versus learning and discuss how our testing result can aid learning.
Efficient Enumeration of Markov Equivalent DAGs
Wienöbst, Marcel, Luttermann, Malte, Bannach, Max, Liśkiewicz, Maciej
Enumerating the directed acyclic graphs (DAGs) of a Markov equivalence class (MEC) is an important primitive in causal analysis. The central resource from the perspective of computational complexity is the delay, that is, the time an algorithm that lists all members of the class requires between two consecutive outputs. Commonly used algorithms for this task utilize the rules proposed by Meek (1995) or the transformational characterization by Chickering (1995), both resulting in superlinear delay. In this paper, we present the first linear-time delay algorithm. On the theoretical side, we show that our algorithm can be generalized to enumerate DAGs represented by models that incorporate background knowledge, such as MPDAGs; on the practical side, we provide an efficient implementation and evaluate it in a series of experiments. Complementary to the linear-time delay algorithm, we also provide intriguing insights into Markov equivalence itself: All members of an MEC can be enumerated such that two successive DAGs have structural Hamming distance at most three.
Causal structure learning with momentum: Sampling distributions over Markov Equivalence Classes of DAGs
Schauer, Moritz, Wienöbst, Marcel
In the context of inferring a Bayesian network structure (directed acyclic graph, DAG for short), we devise a non-reversible continuous time Markov chain, the "Causal Zig-Zag sampler", that targets a probability distribution over classes of observationally equivalent (Markov equivalent) DAGs. The classes are represented as completed partially directed acyclic graphs (CPDAGs). The non-reversible Markov chain relies on the operators used in Chickering's Greedy Equivalence Search (GES) and is endowed with a momentum variable, which improves mixing significantly as we show empirically. The possible target distributions include posterior distributions based on a prior over DAGs and a Markov equivalent likelihood. We offer an efficient implementation wherein we develop new algorithms for listing, counting, uniformly sampling, and applying possible moves of the GES operators, all of which significantly improve upon the state-of-the-art.