Review for NeurIPS paper: Agnostic Learning of a Single Neuron with Gradient Descent
–Neural Information Processing Systems
Summary and Contributions: The paper considers the problem of agnostically learning a single neuron with respect to the squared loss via gradient descent (GD). The focus of the paper is on specifically understanding the guarantees GD obtains. Under only boundedness of the input distribution, the authors show that for strictly increasing (gradient bounded below by a constant) activation functions, GD finds a point that achieves error O(\sqrt{opt}) \eps where opt is the loss of the best fitting neuron. They extend the result to ReLU under standard anti-concentration assumptions (similar to [1,2]). For ReLU, it is known that you in fact get O(opt) \eps under essentially the same assumptions (see [1]) using an alternate algorithm.
Neural Information Processing Systems
Jan-23-2025, 13:11:16 GMT
- Technology: