dag
Iterative Causal Discovery: Per-Edge Impossibility Certificates, Tier-Aware Oracle Queries, and the $1+K$ Lower Bound
Causal-discovery algorithms return a directed graph, yet provide no principled means of distinguishing edge directions identified by the data from those assigned without an identifying assumption. Under the standard Markov and faithfulness conditions, the observational distribution identifies only a Markov equivalence class; orientations within that class are not determined by the joint distribution and cannot be recovered from additional samples alone, but require either a functional restriction or an intervention. We introduce a protocol for observational causal discovery on continuous data that attaches to each candidate edge a discrete impossibility certificate: a RESOLVED code records the identifiability theorem under which the direction was committed, while an IMPOSSIBLE code records the failure mode together with the specific question a domain expert must answer to resolve it. The bivariate cascade is extended with five gated identifiability tiers LSNM, IGCI, Stein, MDL, and PEIT that abstain when their precondition test rejects. Two oracle primitives, the meta-hub query and the node-children query, jointly establish an upper bound of $1+K$ expert interactions sufficient to recover any DAG, where $K$ denotes the number of non-leaf vertices. Under an ideal-oracle assumption, the bound is met exactly on the asia, sachs, child, and alarm benchmarks.
Concomitant DAG Learning: On the Roles of Noise Adaptivity, Sparsity, and Non-negativity
Mateos, Gonzalo, Rey, Samuel, Ajorlou, Hamed, Tepper, Mariano
Directed acyclic graphs (DAGs) constitute a central modeling tool to enable principled reasoning about cause-effect interactions in complex systems. However, since the causal structure underlying a group of variables is often unknown and interventions may be infeasible or ethically challenging to implement, there is a need to address the task of inferring DAGs from observational data. However, most classical structure identification approaches face two key obstacles: the combinatorial challenge of enforcing acyclicity, which severely limits scalability, and identifiability challenges arising from latent confounding or heterogeneous noise. This tutorial offers an overview of recent signal processing and optimization advances that address these issues by recasting DAG structure learning as a continuous, score-based estimation problem over adjacency matrices. We begin with a didactic introduction to structural equation models and the formulation of causal graph recovery, followed by a historical survey of score-based methods ranging from early combinatorial search schemes and greedy heuristics to modern continuous frameworks that leverage smooth characterizations of acyclicity. Building on this foundation, we describe concomitant DAG estimation methods that jointly infer sparse causal structure and exogenous noise levels, improving robustness under heteroscedasticity and distribution shifts by rendering the estimator noise adaptive. All in all, the tutorial introduces readers to challenges and opportunities for signal processing research at the crossroads of causal inference, high-dimensional statistics, and scalable graph learning, while outlining emerging directions including online, nonlinear, and neural causal discovery.
A Unified Framework for Structure-Aware Clustering and Heterogeneous Causal Graph Learning
Du, Honglin, Liang, Muxuan, Zhong, Xiang
In complex multivariate systems, interactions among variables are defined by dependency structures, often encoded as directed acyclic graphs ($\text{DAGs}$). However, dependency structures can vary across subjects, and ignoring this structural heterogeneity introduces bias and obscures subpopulation-specific dependencies. To address this, we propose Directed Acyclic Graph-based Dependency Clustering via Alternating Direction Method of Multipliers (DAG-DC-ADMM), a unified framework built upon Structural Equation Modeling (SEM) that jointly learns cluster assignments and cluster-specific dependency structures. We encode acyclicity via a smooth constraint and integrate a groupwise truncated Lasso fusion penalty (gTLP) to cluster subjects based on their structural similarity. This yields a nonconvex optimization problem that incorporates sparsity, acyclicity, and structural consensus constraints. We address the nonconvexity by using the augmented Lagrangian method and solve it with an adapted version of the Alternating Direction Method of Multipliers (ADMM) for difference-of-convex programs. For certain graph structures, such as upper triangular adjacency matrices, our algorithm is guaranteed to converge to a Karush-Kuhn-Tucker (KKT) point. Experiments demonstrate that our method recovers cluster-specific causal dependency structures with a high true positive rate and a low false discovery rate. This capability enables the robust discovery of heterogeneous dependencies across subjects where the subpopulation label is unknown.
Leveraging heterogeneity for identifiability: Bayesian order-based learning of multiple DAGs
Chang, Hyunwoong, Taskin, Fariha
We propose a joint order-based scoring framework for causal structure learning of directed acyclic graph (DAG) models under heterogeneous data settings. We show that leveraging heterogeneity improves the accuracy of causal ordering estimation. In the most favorable case, the causal ordering is identifiable up to two permutations. Building on this framework, we propose an order-based Bayesian method for Gaussian DAG models and establish its theoretical properties in the high-dimensional regime. For posterior inference over the space of orderings, we introduce a random-to-random (R2R) proposal neighborhood for the Metropolis-Hastings algorithm, which is theoretically motivated and exhibits efficient mixing behavior. Simulation studies confirm the strong empirical performance of the proposed method, and an application to single-nucleus RNA sequencing data from major depressive disorder demonstrates practical utility.
Heterogeneous Ordinal Structure Learning with Bayesian Nonparametric Complexity Discovery
Public attitudes toward artificial intelligence are heterogeneous, ordinally measured, and poorly captured by any single dependency graph. Existing ordinal structure learners assume a shared directed acyclic graph (DAG) across all respondents; recent heterogeneous ordinal graphical-model approaches focus on subgroup discovery rather than confirmatory cluster-specific DAG estimation; and latent profile analyses discard dependency structure entirely. We introduce a heterogeneous ordinal structure-learning framework combining monotone Gaussian score embedding, Bayesian nonparametric (BNP) complexity discovery via a truncated stick-breaking prior, and confirmatory fixed-K estimation with cluster-specific sparse DAG learning. The key methodological insight is a discovery-to-confirmation workflow: the nonparametric stage calibrates plausible archetype complexity, while inner-validated confirmatory refitting yields stable, interpretable structural estimates. On the 2024 Pew American Trends Panel AI attitudes survey, Wave 152 (W152) survey, (N = 4,788, 8 ordinal items), the confirmatory K*=5 model reduces holdout transformed-score mean squared error (MSE) by 25.8% over a single-graph baseline and by 4.6% over mixture-only clustering. A controlled tiered semi-synthetic benchmark calibrated to W152 structure validates recovery across difficulty regimes and transparently reveals failure modes under stress conditions.
Reliable Causal Discovery with Improved Exact Search and Weaker Assumptions
Many of the causal discovery methods rely on the faithfulness assumption to guarantee asymptotic correctness. However, the assumption can be approximately violated in many ways, leading to sub-optimal solutions. Although there is a line of research in Bayesian network structure learning that focuses on weakening the assumption, such as exact search methods with well-defined score functions, they do not scale well to large graphs. In this work, we introduce several strategies to improve the scalability of exact score-based methods in the linear Gaussian setting. In particular, we develop a super-structure estimation method based on the support of inverse covariance matrix which requires assumptions that are strictly weaker than faithfulness, and apply it to restrict the search space of exact search. We also propose a local search strategy that performs exact search on the local clusters formed by each variable and its neighbors within two hops in the superstructure. Numerical experiments validate the efficacy of the proposed procedure, and demonstrate that it scales up to hundreds of nodes with a high accuracy.
Supplementary material for Dynamic Causal Bayesian Optimisation
Symbol Description Vt Set of observable variables at time t V0:TUnion of observable variables at time t= 0,...,T Xt Manipulative variables at time t Yt Target variable at time t P(Xt) Power set of Xt Mt Set of MIS sets at time t Xs,ts-th intervention set at time t In this section we give the proof for Theorem 1 in the main text. This means that W includes those variables that are parents of Yt but are nor target at previous time steps nor intervened variables. In the following proof the values of IV0:t 1, XPYs,t, IPY0:t 1 and W are denoted by i, xPY, iPY and w respectively. Finally, fYY and fNYYare the functions in the SCM for Yt (see Assumptions (1) in the main text). Eq. (2) follows from the Eq. Finally, noticing that p(yPTt |I0:t 1) is the distribution targeted when optimizing the objective function at previous time steps one can obtain Eq. (6). The derivations above show how the objective function at time t is given by the expected value of the output of the functional relationship fNYYwhere the expectation is taken with respect to the variables that are not intervened on. This expectation is then shifted to account for the interventions implemented in the system at previous time steps that are affecting the target variable through fYY .
BCDNets: Scalable Variational Approaches for Bayesian Causal Discovery
A structural equation model (SEM) is an effective framework to reason over causal relationships represented via a directed acyclic graph (DAG). Recent advances have enabled effective maximum-likelihood point estimation of DAGs from observational data. However, a point estimate may not accurately capture the uncertainty in inferring the underlying graph in practical scenarios, wherein the true DAG is non-identifiable and/or the observed dataset is limited. We propose Bayesian Causal Discovery Nets (BCD Nets), a variational inference framework for estimating a distribution over DAGs characterizing a linear-Gaussian SEM. Developing a full Bayesian posterior over DAGs is challenging due to the the discrete and combinatorial nature of graphs. We analyse key design choices for scalable VI over DAGs, such as 1) the parametrization of DAGs via an expressive variational family, 2) a continuous relaxation that enables low-variance stochastic optimization, and 3) suitable priors over the latent variables. We provide a series of experiments on real and synthetic data showing that BCDNets outperform maximum-likelihood methods on standard causal discovery metrics such as structural Hamming distance in low data regimes.