Assumption-Lean Post-Integrated Inference with Negative Control Outcomes

Du, Jin-Hong, Roeder, Kathryn, Wasserman, Larry

Nov-24-2024–arXiv.org Machine Learning

In the big data era, integrating information from multiple heterogeneous sources has become increasingly crucial for achieving larger sample sizes and more diverse study populations. The applications of data integration are in a variety of fields, including but not limited to, causal inference on heterogeneous populations (Shi et al., 2023), survey sampling (Yang et al., 2020), health policy (Paddock et al., 2024), retrospective psychometrics (Howe and Brown, 2023), and multi-omics biological science (Du et al., 2022). Data integration methods have been proposed to mitigate the unwanted effects of heterogeneous datasets and unmeasured covariates, recovering the common variation across datasets. However, a critical and often overlooked question is whether reliable statistical inference can be made from integrated data. Directly performing statistical inference on integrated outcomes and covariates of interests fails to account for the complex correlation structures introduced by the data integration process, often leading to improper analyses that incorrectly assume the corrected data points are independent (Li et al., 2023). While data integration is broadly utilized in various fields, our paper focuses on a challenging scenario with the presence of high-dimensional outcomes.

artificial intelligence, bayesian inference, machine learning, (18 more...)

arXiv.org Machine Learning

Nov-24-2024

arXiv.org PDF

Add feedback

Country:
- North America > United States
  - California (0.04)
  - Pennsylvania > Allegheny County
    - Pittsburgh (0.04)
  - New York > Albany County
    - Albany (0.04)
- Europe > United Kingdom
  - England > Cambridgeshire > Cambridge (0.04)

Genre:
- Research Report
  - Experimental Study (1.00)
  - New Finding (0.92)

Industry:
- Health & Medicine > Pharmaceuticals & Biotechnology (1.00)

Technology:
- Information Technology
  - Data Science > Data Integration (1.00)
  - Artificial Intelligence
    - Representation & Reasoning
      - Information Fusion (1.00)
      - Uncertainty > Bayesian Inference (0.67)
    - Machine Learning
      - Statistical Learning > Regression (0.47)
      - Learning Graphical Models > Directed Networks
        Bayesian Learning (0.46)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found