ci test
On the Number of Conditional Independence Tests in Constraint-based Causal Discovery
Monés, Marc Franquesa, Zhang, Jiaqi, Uhler, Caroline
Learning causal relations from observational data is a fundamental problem with wide-ranging applications across many fields. Constraint-based methods infer the underlying causal structure by performing conditional independence tests. However, existing algorithms such as the prominent PC algorithm need to perform a large number of independence tests, which in the worst case is exponential in the maximum degree of the causal graph. Despite extensive research, it remains unclear if there exist algorithms with better complexity without additional assumptions. Here, we establish an algorithm that achieves a better complexity of $p^{\mathcal{O}(s)}$ tests, where $p$ is the number of nodes in the graph and $s$ denotes the maximum undirected clique size of the underlying essential graph. Complementing this result, we prove that any constraint-based algorithm must perform at least $2^{Ω(s)}$ conditional independence tests, establishing that our proposed algorithm achieves exponent-optimality up to a logarithmic factor in terms of the number of conditional independence tests needed. Finally, we validate our theoretical findings through simulations, on semi-synthetic gene-expression data, and real-world data, demonstrating the efficiency of our algorithm compared to existing methods in terms of number of conditional independence tests needed.
- North America > United States > California > San Francisco County > San Francisco (0.14)
- North America > United States > Virginia > Arlington County > Arlington (0.04)
- South America > Paraguay > Asunción > Asunción (0.04)
- (6 more...)
Federated Causal Discovery Across Heterogeneous Datasets under Latent Confounding
Hahn, Maximilian, Zajak, Alina, Heider, Dominik, Ribeiro, Adèle Helena
Causal discovery across multiple datasets is often constrained by data privacy regulations and cross-site heterogeneity, limiting the use of conventional methods that require a single, centralized dataset. To address these challenges, we introduce fedCI, a federated conditional independence test that rigorously handles heterogeneous datasets with non-identical sets of variables, site-specific effects, and mixed variable types, including continuous, ordinal, binary, and categorical variables. At its core, fedCI uses a federated Iteratively Reweighted Least Squares (IRLS) procedure to estimate the parameters of generalized linear models underlying likelihood-ratio tests for conditional independence. Building on this, we develop fedCI-IOD, a federated extension of the Integration of Overlapping Datasets (IOD) algorithm, that replaces its meta-analysis strategy and enables, for the fist time, federated causal discovery under latent confounding across distributed and heterogeneous datasets. By aggregating evidence federatively, fedCI-IOD not only preserves privacy but also substantially enhances statistical power, achieving performance comparable to fully pooled analyses and mitigating artifacts from low local sample sizes. Our tools are publicly available as the fedCI Python package, a privacy-preserving R implementation of IOD, and a web application for the fedCI-IOD pipeline, providing versatile, user-friendly solutions for federated conditional independence testing and causal discovery.
- Europe > Germany (0.04)
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- North America > United States > Colorado > Boulder County > Boulder (0.04)
- (2 more...)
Appendix A PCMCI Algorithm
The PCMCI algorithm is proposed by Runge et al. [2019], aiming to detect time-lagged causal See Fig.1 for more detail. A simple proof is shown below through Markov assumption ( A2). 3 Figure 2: Partial causal graph for 3-variate time series Fig.2 shows a partial causal graph for a 3-variate time series with Semi-Stationary SCM. However, they may not share the same marginal distribution. Still in Fig.2, based on the definition of homogenous time partition, time partition subset Based on Eq.(12) and Eq.(17), we have: p(X Without loss of generality, we assume T is a multiple of δ all the time. A1-A7 and with an oracle (infinite sample size limit), we have that: null G = G (47) almost surely.
- North America > United States > Arizona (0.04)
- Europe > Italy > Tuscany > Florence (0.04)
- Health & Medicine (0.46)
- Energy (0.46)
A Supplement
Here we provide proofs of the statements made in the main text as well as further figures of numerical experiments and a more detailed discussion of heteroskedasticity effects regarding causal discovery. Z. Testing whether the Pearson correlation between X and Y is zero is equivalent to testing whether the slope parameter β is equal to zero. Therefore, this is a homoskedastic problem. A.1.2 Discussion of Effect 2: We start by discussing the homoskedastic case to see where non-constant variance of noise leads to problems within the t-test. For homoskedastic noise the second factor is an estimator of the standard error of ˆβ, which we derive by using the mean of the squared residual as an estimator for the error variance.
- Europe > Germany > Berlin (0.04)
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- Europe > Spain > Canary Islands (0.04)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.94)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.68)
- North America > United States > California > San Francisco County > San Francisco (0.14)
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- North America > Canada (0.04)
- (6 more...)
- Europe > Switzerland > Vaud > Lausanne (0.04)
- North America > United States (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)