Improving generalisation via anchor multivariate analysis

Durand, Homer, Varando, Gherardo, Mankovich, Nathan, Camps-Valls, Gustau

arXiv.org Machine Learning 

Data sources in contemporary machine learning applications are often heterogeneous, leading to potential distribution shifts Sugiyama and Kawanabe [2012], Shen et al. [2021]. This is a particularly relevant problem in computer vision Csurka [2017], healthcare Zhang et al. [2021], finance, Earth and climate sciences [Tuia et al., 2016, Kellenberger et al., 2021] and social sciences, as variations in data patterns can significantly impact model performance and generalisation in the out-of-distribution (OOD) setting, also referred to as domain generalisation [Shen et al., 2021, Zhou et al., 2023]. Various frameworks have been proposed to formally address the emergence of distribution shifts during the testing phase Peters et al. [2016], Arjovsky et al. [2020]. In cases where the data distribution is entailed by a Structural Causal Model (SCM) Peters et al. [2017], one can consider distribution shifts arising from intervention on specific variables of the SCM. Notably, the Instrumental Variable (IV) regression exhibits robustness to arbitrarily strong interventions [Bowden and Turkington, 1990]. However, pursuing algorithms robust to strong interventions may be overly conservative, especially when prior knowledge is available regarding the intervention strength that generates the distribution shift. Anchor Regression (AR) addresses this challenge by explicitly considering interventions on exogenous variables up to a specified strength [Rothenhäusler et al., 2018].

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found