Clustering and Pruning in Causal Data Fusion
Tabell, Otto, Tikka, Santtu, Karvanen, Juha
Data fusion--the process of combining observational and exp erimental data--can enable the identification of causal effects that would otherwise rem ain non-identifiable. Although identification algorithms have been developed for specific s cenarios, do-calculus remains the only general-purpose tool for causal data fusion, particul arly when variables are present in some data sources but not others. However, approaches based on do-calculus may encounter computational challenges as the number of variables increa ses and the causal graph grows in complexity. Consequently, there exists a need to reduce t he size of such models while preserving the essential features. For this purpose, we pro pose pruning (removing unnecessary variables) and clustering (combining variables) as pr eprocessing operations for causal data fusion. We generalize earlier results on a single data s ource and derive conditions for applying pruning and clustering in the case of multiple data sources. We give sufficient conditions for inferring the identifiability or non-identi fiability of a causal effect in a larger graph based on a smaller graph and show how to obtain the corre sponding identifying functional for identifiable causal effects. Examples from ep idemiology and social science demonstrate the use of the results.
May-22-2025
- Country:
- Europe
- Austria > Vienna (0.14)
- Finland > Central Finland
- Jyväskylä (0.04)
- United Kingdom > England
- Cambridgeshire > Cambridge (0.04)
- South America > Brazil (0.04)
- Europe
- Genre:
- Research Report (1.00)
- Industry:
- Health & Medicine > Therapeutic Area (0.67)
- Technology: