Weighted Empirical Risk Minimization: Sample Selection Bias Correction based on Importance Sampling

Vogel, Robin, Achab, Mastane, Clémençon, Stéphan, Tillier, Charles

Feb-19-2020–arXiv.org Machine Learning

ABSTRACT We consider statistical learning problems, when the distribution P ′ of the training observations Z ′ 1,..., Z′ n differs from the distribution P involved in the risk one seeks to minimize (referred to as the test distribution) but is still defined on the same measurable space as P and dominates it. In the unrealistic case where the likelihood ratio Φ(z) dP/dP ′ (z) is known, one may straightforwardly extends the Empirical Risk Minimization (ERM) approach to this specific transfer learning setup using the same idea as that behind Importance Sampling, by minimizing a weighted version of the empirical risk functional computed from the'biased' training data Zi ′ with weights Φ(Zi ′). Although the importance function Φ(z) is generally unknown in practice, we show that, in various situations frequently encountered in practice, it takes a simple form and can be directly estimated from the Zi ′ 's and some auxiliary information on the statistical population P. By means of linearization techniques, we then prove that the generalization capacity of the approach aforementioned is preserved when plugging the resulting estimates of the Φ(Zi ′)'s into the weighted empirical risk. Beyond these theoretical guarantees, numerical results provide strong empirical evidence of the relevance of the approach promoted in this article. Keywords: Statistical Learning Theory, Importance Sampling, Transfer Learning. 1 Introduction Prediction problems are of major importance in statistical learning. The main paradigm of predictive learning is Empirical Risk Minimization (ERM in abbreviated form), see e.g. In the standard setup, Z is a random variable (r.v. in short) that takes its values in a feature space Z with distribution P, Θ is a parameter space and l: Θ Z R is a (measurable) loss function. The risk is then defined by: θ Θ, R P (θ) E P [l(θ, Z)], (1) and more generally for any measure Q on Z: R Q (θ) l(θ, z)dQ(z).

dataset, information, strata, (14 more...)

arXiv.org Machine Learning

Feb-19-2020

arXiv.org PDF

Add feedback

Country:
- Europe > France (0.04)

Genre:
- Research Report (0.50)

Industry:
- Education (0.34)

Technology:
- Information Technology > Artificial Intelligence > Machine Learning
  - Statistical Learning (1.00)
  - Neural Networks > Deep Learning (0.46)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found