Forster Decomposition and Learning Halfspaces with Noise

Diakonikolas, Ilias, Kane, Daniel M., Tzamos, Christos

arXiv.org Machine Learning 

The motivating application for this paper is the problem of(distribution-independent) PAC learning of halfspaces in the presence of label noise, and more specifically in the Massart (or bounded noise) model. Recent work [DGT19] obtained the first computationally efficient learning algorithm with non-trivial error guarantee for this problem. Interestingly, the sample complexity of the [DGT19] algorithm scales polynomially with the bit complexity of the examples (in addition, of course, to the dimension and the inverse of desired accuracy). This bit-complexity dependence in the sample complexity is an artifact of the algorithmic approach in [DGT19]. Information-theoretically, no such dependence is needed -- alas, the standard VC-dimension-based sample upper bound [MN06] is non-constructive. Motivated by this qualitative gap in our understanding, here we develop a methodology that leads to a computationally efficient learning algorithm for Massart halfspaces (matching the error guarantee of [DGT19]) with "strongly polynomial" sample complexity, i.e., sample complexity completely independent of the bit complexity of the examples. Halfspaces and Efficient Learnability We study the binary classification setting, where the goal is to learn a Boolean function from random labeled examples with noisy labels. Our focus is on the problem of learning halfspaces in Valiant's PAC learning model [Val84] when the labels have been corrupted by Massart noise [MN06].