feldman
- Europe > Germany (0.05)
- North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
- North America > Canada > Quebec > Montreal (0.04)
- Europe > Italy (0.04)
- North America > United States > California > Orange County > Irvine (0.04)
- North America > Canada > Ontario > Toronto (0.04)
- Asia > Middle East > Jordan (0.04)
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- North America > Canada > Ontario > Toronto (0.04)
- North America > United States > New York > New York County > New York City (0.05)
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.05)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- (3 more...)
All ERMs Can Fail in Stochastic Convex Optimization Lower Bounds in Linear Dimension
We study the sample complexity of the best-case Empirical Risk Minimizer in the setting of stochastic convex optimization. We show that there exists an instance in which the sample size is linear in the dimension, learning is possible, but the Empirical Risk Minimizer is likely to be unique and to overfit. This resolves an open question by Feldman. We also extend this to approximate ERMs. Building on our construction we also show that (constrained) Gradient Descent potentially overfits when horizon and learning rate grow w.r.t sample size. Specifically we provide a novel generalization lower bound of $Ω\left(\sqrt{ηT/m^{1.5}}\right)$ for Gradient Descent, where $η$ is the learning rate, $T$ is the horizon and $m$ is the sample size. This narrows down, exponentially, the gap between the best known upper bound of $O(ηT/m)$ and existing lower bounds from previous constructions.
- Asia > Middle East > Israel > Tel Aviv District > Tel Aviv (0.04)
- North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
- Asia > Middle East > Jordan (0.04)
- Oceania > Australia > New South Wales > Sydney (0.04)
- North America > United States > California > Santa Clara County > Palo Alto (0.04)
- Europe > Netherlands > North Holland > Amsterdam (0.04)
Can Implicit Bias Explain Generalization Stochastic Convex Optimization Case Study
One of the great mysteries of contemporary machine learning is the impressive success ofunregularized and overparameterized learningalgorithms. In detail,current machinelearningpracticeis to trainmodels with far more parameters than samples and let the algorithmfit the data, oftentimes without any type of regularization. In fact, these algorithms are so overcapacitated that they can even memorize and fit random data (Neyshabur et al., 2015; Zhang et al., 2017). Yet, when trained on real-life data, these algorithms show remarkable performance in generalizing to unseen samples. This phenomenon is often attributed to what is described as theimplicit-regularization of an algorithm (Neyshabur et al., 2015). Implicit regularization roughly refers to the learner's preference to implicitly choosing certain structured solutionsas if some explicit regularization term appeared in its objective.
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- North America > United States > California > San Diego County > San Diego (0.04)
- Europe > France (0.04)
- Asia > Japan > Kyūshū & Okinawa > Okinawa (0.04)