Learning with Square Loss: Localization through Offset Rademacher Complexity

Liang, Tengyuan, Rakhlin, Alexander, Sridharan, Karthik

arXiv.org Machine Learning 

Determining the finite-sample behavior of risk in the problem of regression is arguably one of the most basic problems of Learning Theory and Statistics. This behavior can be studied in substantial generality with the tools of empirical process theory. When functions in a given convex class are uniformly bounded, one may verify the socalled "Bernstein condition." The condition--which relates the variance of the increments of the empirical process to their expectation--implies a certain localization phenomenon around the optimum and forms the basis of the analysis via local Rademacher complexities. The technique has been developed in [9, 8, 5, 2, 4], among others, based on Talagrand's celebrated concentration inequality for the supremum of an empirical process. In a recent pathbreaking paper, [14] showed that a large part of this heavy machinery is not necessary for obtaining tight upper bounds on excess loss, even--and especially--if functions are unbounded. Mendelson observed that only one-sided control of the tail is required in the deviation inequality, and, thankfully, it is the tail that can be controlled under very mild assumptions. In a parallel line of work, the search within the online learning setting for an analogue of "localization" has led to a notion of an "offset" Rademacher process [17], yielding--in a rather clean manner--optimal rates for minimax regret in online supervised learning. It was also shown that the supremum of the offset process is a lower bound on the minimax value, thus establishing its intrinsic nature.

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found