Agnostic Learning of a Single Neuron with Gradient Descent
–Neural Information Processing Systems
We consider the problem of learning the best-fitting single neuron as measured by the expected square loss $\E_{(x,y)\sim \mathcal{D}}[(\sigma(w^\top x)-y)^2]$ over some unknown joint distribution $\mathcal{D}$ by using gradient descent to minimize the empirical risk induced by a set of i.i.d.
Neural Information Processing Systems
Dec-23-2025, 23:11:28 GMT
- Technology: