Clustering and Pruning in Causal Data Fusion

Tabell, Otto, Tikka, Santtu, Karvanen, Juha

May-22-2025–arXiv.org Machine Learning

Data fusion--the process of combining observational and exp erimental data--can enable the identification of causal effects that would otherwise rem ain non-identifiable. Although identification algorithms have been developed for specific s cenarios, do-calculus remains the only general-purpose tool for causal data fusion, particul arly when variables are present in some data sources but not others. However, approaches based on do-calculus may encounter computational challenges as the number of variables increa ses and the causal graph grows in complexity. Consequently, there exists a need to reduce t he size of such models while preserving the essential features. For this purpose, we pro pose pruning (removing unnecessary variables) and clustering (combining variables) as pr eprocessing operations for causal data fusion. We generalize earlier results on a single data s ource and derive conditions for applying pruning and clustering in the case of multiple data sources. We give sufficient conditions for inferring the identifiability or non-identi fiability of a causal effect in a larger graph based on a smaller graph and show how to obtain the corre sponding identifying functional for identifiable causal effects. Examples from ep idemiology and social science demonstrate the use of the results.

artificial intelligence, information fusion, input distribution, (14 more...)

arXiv.org Machine Learning

May-22-2025

arXiv.org PDF

Add feedback

Country:
- South America > Brazil (0.04)
- Europe
  - Austria > Vienna (0.14)
  - United Kingdom > England
    - Cambridgeshire > Cambridge (0.04)
  - Finland > Central Finland
    - Jyväskylä (0.04)

Genre:
- Research Report (1.00)

Industry:
- Health & Medicine > Therapeutic Area (0.67)

Technology:
- Information Technology
  - Data Science > Data Integration (1.00)
  - Artificial Intelligence > Representation & Reasoning
    - Information Fusion (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found