1ae6464c6b5d51b363d7d96f97132c75-AuthorFeedback.pdf
–Neural Information Processing Systems
We show that running SGD on the`1 loss outperforms all current algorithms, theoretically8 and empirically. As pointed out, the dependency onη (or rather η) we obtain23 might notbeoptimal and isstill aninteresting open question. Therefore, in the setting we consider, the rate we34 obtaincannotbeimproved.35 To Reviewer 3. The linear model is one of the simplest model we could have considered and linear regression is36 certainly amongst the oldest and most fundamental statistical methods. AsseeninTheorem 4,thetermsdepending onthevarianceσgoto0asσ 0. Hencetheextreme42 case where there is no'nice' noise is not pathological and the algorithm still performs well.
Neural Information Processing Systems
Feb-7-2026, 16:15:34 GMT
- Technology: