Learning with Square Loss: Localization through Offset Rademacher Complexity
Liang, Tengyuan, Rakhlin, Alexander, Sridharan, Karthik
Determining the finite-sample behavior of risk in the problem of regression is arguably one of the most basic problems of Learning Theory and Statistics. This behavior can be studied in substantial generality with the tools of empirical process theory. When functions in a given convex class are uniformly bounded, one may verify the socalled "Bernstein condition." The condition--which relates the variance of the increments of the empirical process to their expectation--implies a certain localization phenomenon around the optimum and forms the basis of the analysis via local Rademacher complexities. The technique has been developed in [9, 8, 5, 2, 4], among others, based on Talagrand's celebrated concentration inequality for the supremum of an empirical process. In a recent pathbreaking paper, [14] showed that a large part of this heavy machinery is not necessary for obtaining tight upper bounds on excess loss, even--and especially--if functions are unbounded. Mendelson observed that only one-sided control of the tail is required in the deviation inequality, and, thankfully, it is the tail that can be controlled under very mild assumptions. In a parallel line of work, the search within the online learning setting for an analogue of "localization" has led to a notion of an "offset" Rademacher process [17], yielding--in a rather clean manner--optimal rates for minimax regret in online supervised learning. It was also shown that the supremum of the offset process is a lower bound on the minimax value, thus establishing its intrinsic nature.
Jun-15-2015
- Country:
- Europe > United Kingdom
- England > Oxfordshire > Oxford (0.04)
- North America > United States
- Pennsylvania (0.04)
- Oceania > Australia
- Australian Capital Territory > Canberra (0.04)
- New South Wales > Sydney (0.04)
- Europe > United Kingdom
- Genre:
- Instructional Material (0.34)
- Research Report (0.40)
- Technology: