mag
Local Covariate Selection for Average Causal Effect Estimation without Pretreatment and Causal Sufficiency Assumptions
Liu, Zeyu, Li, Zheng, Xie, Feng, Zeng, Yan, Zhang, Hao, Zhang, Kun
We study the problem of selecting covariates for unbiased estimation of the total causal effect.Existing approaches typically rely on global causal structure learning over all variables, or on strong assumptions such as causal sufficiency - where observed variables share no latent confounders - or the pretreatment assumption, which limits covariates to those unaffected by the treatment or outcome. These requirements are often unrealistic in practice, and global learning becomes computationally prohibitive in high-dimensional settings.To address these challenges, we propose a novel local learning method for covariate selection in nonparametric causal effect estimation that avoids both the pretreatment and causal sufficiency assumptions. We first characterize a local boundary that contains at least one valid adjustment set whenever one exists for identifying the causal effect, and then develop local identification procedures to efficiently search within this boundary.We prove that the proposed method is sound and complete. Experiments on multiple synthetic datasets and two real-world datasets show that our approach achieves accurate causal effect estimation while substantially improving computational efficiency.
Appendix ARemovable Variables
In this section, we first prove the proposed graphical representation for a removable variable in a MAGM (Theorem 1). Then, we discuss how this representation reduces to Theorem 5 of [11] in the case of DAGs. Throughout our proofs, we say a path between X and Y is blocked by a set Wif it is not m-connecting relative to W. In this case, there exists a non-collider W on the path which is a member of W, or there exists a collider W on the path such that W/2 Anc({X,Y }[ W). In both cases we say W blocks this path with respect to W, or W blocks the path in short when W is clear from the context. We say X is a descendant of Y if Y 2Anc(X), and we denote by DeM(X) the set of descendants of X in the MAGM, and De(X) whenever the graph is clear from the context. A.1 Graphical representation Theorem 1. Vertex X is removable in a MAGM over the variables V, if and only if 1. for any Y 2Adj(X) and Z 2Ch(X)[N(X)\{Y}, Y and Z are adjacent, and 2. for any collider path u =( X,V1,...,V m,Y) and Z 2 V\{X,Y,V1,...,V m} such that {X,V1,...,V m} Pa(Z), Y and Z are adjacent. Let H denote the induced subgraph of M over V\{X}. For any W V\{X,Y,Z}, (Z,X,Y) is an m-connecting path relative to W in M, as X is a non-collider and X/2W. That is, no such W can m-separate Y and Z. Since X is removable in M, by definition of removability, (Y?Z|W)M ()(Y?Z|W)H. Again for any W V\{X,Y,Z}, (Z,X,V1,...,V m,Y) is an m-connecting path relative to W in M since I) every collider on this path is a parent (and therefore an ancestor) of Z, and II) X/2W and X is the only non-collider on this path. That is, no such W can m-separate Y and Z. Since X is removable in M, Equation 8 implies that Y and Z have no m-separating sets in H. Hence, Y is adjacent to Z in H, and therefore, in M. if part: We need to prove that for any Y,Z 2V\{X} and any W V\{X,Y,Z}, (Y?Z|W)M ()(Y?Z|W)H.
The Geometric Alignment Tax: Tokenization vs. Continuous Geometry in Scientific Foundation Models
Foundation models for biology and physics optimize predictive accuracy, but their internal representations systematically fail to preserve the continuous geometry of the systems they model. We identify the root cause: the Geometric Alignment Tax, an intrinsic cost of forcing continuous manifolds through discrete categorical bottlenecks. Controlled ablations on synthetic dynamical systems demonstrate that replacing cross-entropy with a continuous head on an identical encoder reduces geometric distortion by up to 8.5x, while learned codebooks exhibit a non-monotonic double bind where finer quantization worsens geometry despite improving reconstruction. Under continuous objectives, three architectures differ by 1.3x; under discrete tokenization, they diverge by 3,000x. Evaluating 14 biological foundation models with rate-distortion theory and MINE, we identify three failure regimes: Local-Global Decoupling, Representational Compression, and Geometric Vacuity. A controlled experiment confirms that Evo 2's reverse-complement robustness on real DNA reflects conserved sequence composition, not learned symmetry. No model achieves simultaneously low distortion, high mutual information, and global coherence.
Complete Causal Identification from Ancestral Graphs under Selection Bias
Many causal discovery algorithms, including the celebrated FCI algorithm, output a Partial Ancestral Graph (PAG). PAGs serve as an abstract graphical representation of the underlying causal structure, modeled by directed acyclic graphs with latent and selection variables. This paper develops a characterization of the set of extended-type conditional independence relations that are invariant across all causal models represented by a PAG. This theory allows us to formulate a general measure-theoretic version of Pearl's causal calculus and a sound and complete identification algorithm for PAGs under selection bias. Our results also apply when PAGs are learned by certain algorithms that integrate observational data with experimental data and incorporate background knowledge.
CollaborativeCausalDiscovery withAtomicInterventions
Asinterventions areexpensive(require carefully controlled experiments) andperforming multiple interventions is time-consuming, an important goal in causal discovery is to design algorithms that utilize simple (preferably, single variable) and fewer interventions [Shanmugam et al.,2015]. However, when there are latents or unobserved variables in the system, in the worst-case, it is not possible to learn the exact causal DAG without intervening on every variable at least once.