Pre-registration for Predictive Modeling

Hofman, Jake M., Chatzimparmpas, Angelos, Sharma, Amit, Watts, Duncan J., Hullman, Jessica

Nov-30-2023–arXiv.org Artificial Intelligence

Several scientific communities are currently facing a replication crisis, wherein it has proven difficult or impossible for researchers to independently verify the results of previously published studies. Failures to replicate large swaths of experimental work (Camerer et al., 2018; Nosek et al., 2015; Begley and Ellis, 2012; Baker, 2016) have come in fields like psychology or medicine, that focus on what Hofman et al. (2021) call explanatory modeling, where the goal is to identify and estimate causal effects (e.g., is there an effect of X on Y, and if so, how large is it?). While there are many different factors that can contribute to unreliable findings in explanatory modeling, the combination of small-scale experiments involving noisy measurements and the (mis)use of null hypothesis significance testing (NHST) has received a great deal of attention in recent years. Under these conditions, researchers can mistake idiosyncratic patterns in noise for true effects, resulting in unreliable findings that do not replicate upon further investigation (Button et al., 2013; Loken and Gelman, 2017; Meehl, 1990; Simmons et al., 2011). More generally, some forms of data-dependent decision making (e.g., about how to define research questions or hypotheses, how to filter or transform data, how to model data, what tests to run, etc.) can lead to similar problems regardless of the specifics of the methods (Gelman and Loken, 2013). What about other fields, such as machine learning and data science, that focus less on explanation and more on predictive modeling, defined in Hofman et al. (2021) as directly forecasting outcomes (e.g., how well can an outcome Y be predicted using all available features X?) without necessarily focusing on isolating individual causal effects? Predictive modeling is typically done by testing (out-of-sample) predictions on large-scale datasets, and hence--unlike explanatory modeling--involves neither small experiments nor misuse of significance testing. With advances in the fields of statistics and machine learning (ML) we have seen remarkable performance gains in predictive modeling over the last decade, for both traditional ML tasks and for scientific applications. The same methods that have been shown to achieve at or above human-level performance on tasks like playing chess, classifying images, or understanding natural language (Zhang et al.,

artificial intelligence, machine learning, participant, (18 more...)

arXiv.org Artificial Intelligence

Nov-30-2023

arXiv.org PDF

Add feedback

Country:
- North America > United States (0.67)

Genre:
- Personal > Interview (1.00)
- Research Report
  - Experimental Study (1.00)
  - New Finding (0.88)

Industry:
- Education (1.00)
- Health & Medicine > Therapeutic Area (0.93)

Technology:
- Information Technology > Artificial Intelligence > Machine Learning
  - Neural Networks (0.92)
  - Statistical Learning (1.00)