A new similarity measure for covariate shift with applications to nonparametric regression

Pathak, Reese, Ma, Cong, Wainwright, Martin J.

Feb-6-2022–arXiv.org Machine Learning

In the standard formulation of prediction or classification, future data (as represented by a test set) is assumed to be drawn from the same distribution as the training data. This assumption, while theoretically convenient, may fail to hold in many real-world scenarios. For instance, training data might be collected only from a sub-group within a broader population (such as in medical trials), or the environment might change over time as data are collected. Such scenarios result in a distribution mismatch between the training and test data. In this paper, we study an important case of such distribution mismatch--namely, the covariate shift problem (e.g., [21, 19]). Suppose that a statistician observes covariate-response pairs (X, Y), and wishes to build a prediction rule. In the problem of covariate shift, the distribution of the covariates X is allowed to change between the training and test data, while the posterior distribution of the responses (namely, Y X) remains fixed. Compared to the usual i.i.d.

covariate shift, similarity measure, transfer exponent, (16 more...)

arXiv.org Machine Learning

Feb-6-2022

arXiv.org PDF

Add feedback

Country:
- North America > United States
  - New York (0.04)
  - Illinois > Cook County
    - Chicago (0.04)
- Europe
  - Czechia > Prague (0.04)
  - United Kingdom > England
    - Cambridgeshire > Cambridge (0.14)

Genre:
- Research Report (0.82)

Technology:
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.47)