Chaining Bounds for Empirical Risk Minimization

Balázs, Gábor, György, András, Szepesvári, Csaba

arXiv.org Machine Learning 

This paper extends the standard chaining technique (e.g., Pollard, 1990; Dudley, 1999; Györfi et al., 2002; Boucheron et al., 2012) to prove high-probability excess risk upper bounds for empirical risk minimization (ERM) for random design settings even if the magnitude of the noise and the estimates is unbounded. Our result (Theorem 1) covers bounded settings (Bartlett et al., 2005; Koltchinskii, 2011), extends to sub-Gaussian or even subexponential noise(van de Geer, 2000; Györfi and Wegkamp, 2008), and handles hypothesis classes with unbounded magnitude (Lecué and Mendelson, 2013; Mendelson, 2014; Liang et al., 2015). Furthermore, it applies to many loss functions besides the squared loss, and does not need additional statistical assumptions such as the bounded kurtosis of the transformed covariates over the hypothesis class, which prevent the latest developments to provide tight excess risk bounds for many sub-Gaussian cases (Section 1.2). To demonstrate the effectiveness of our method for such unbounded settings, we use our general excess risk bound (Theorem 1) to provide a detailed analysis for linear least squares estimators using quadratic slope constraint and penalty with sub-Gaussian noise and domain for the random design, nonrealizable setting(Section 3). Our result for the slope constrained case extends Theorem A of Lecué and Mendelson (2013) and nearly proves the conjecture of Shamir (2015), while our treatment for the penalized case (ridge regression) is comparable to the work of Hsu et al. (2014). The rest of this section introduces our notation through the formal definition of the regression problem and ERM estimators (Section 1.1), and discusses the limitations of 1 current excess risk upper bounds in the literature (Section 1.2). Then, we provide our main result in Section 2 to upper bound the excess risk of ERM estimators, and discuss its properties for various settings including many loss functions besides the squared loss. Next, Section 3 provides a detailed analysis for linear least squares estimators including the slope constrained case (Section 3.1) and ridge regression (Section 3.2). Finally, Section 4 proves our main result (Theorem 1).

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found