Assumption-lean and Data-adaptive Post-Prediction Inference

Miao, Jiacheng, Miao, Xinran, Wu, Yixuan, Zhao, Jiwei, Lu, Qiongshi

Nov-23-2023–arXiv.org Machine Learning

A fundamental challenge in modern scientific research is the acquisition of gold standard data (Wang et al., 2023). These data, with their high accuracy and reliability, are essential to the validity of scientific discoveries, but obtaining them is often costly and labor-intensive. Fortunately, the advent and rapid development of machine learning (ML) has made it possible to predict outcomes using accessible covariates (He et al., 2016; LeCun et al., 2015). A prominent example is AlphaFold (Jumper et al., 2021), which uses readily available protein amino acid sequences to accurately predict protein structures that traditionally require extensive experimental efforts to determine. This ML-based approach has demonstrated its potential to substantially reduce the time and resources required to measure gold standard data (Cheng et al., 2023; Stokes et al., 2020). Despite these benefits, replacing gold standard data with ML-prediction introduces new challenges, particularly in maintaining the validity of downstream statistical analyses. The indiscriminate use of such predictions, without acknowledging their distinction from observed gold-standard data, can lead to biased results and misleading scientific conclusions (Wang et al., 2020). This issue is exemplified by the statistical analysis using imputed gene expression in the Genotype-Tissue Expression (GTEx) project.

artificial intelligence, estimator, machine learning, (17 more...)

arXiv.org Machine Learning

Nov-23-2023

arXiv.org PDF

Add feedback

Country:
- North America > United States > Wisconsin (0.14)

Genre:
- Research Report (0.82)

Industry:
- Health & Medicine > Pharmaceuticals & Biotechnology (1.00)

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning > Statistical Learning (0.69)
  - Representation & Reasoning (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found