convex case
- North America > United States (0.27)
- Asia > Nepal (0.04)
- Asia > China > Beijing > Beijing (0.04)
- (2 more...)
An Efficient Pruning Algorithm for Robust Isotonic Regression
We study a generalization of the classic isotonic regression problem where we allow separable nonconvex objective functions, focusing on the case of estimators used in robust regression. A simple dynamic programming approach allows us to solve this problem to within ε-accuracy (of the global minimum) in time linear in 1/ε and the dimension. We can combine techniques from the convex case with branch-and-bound ideas to form a new algorithm for this problem that naturally exploits the shape of the objective function. Our algorithm achieves the best bounds for both the general nonconvex and convex case (linear in log (1/ε)), while performing much faster in practice than a straightforward dynamic programming approach, especially as the desired accuracy increases.
The Step Decay Schedule: A Near Optimal, Geometrically Decaying Learning Rate Procedure For Least Squares
Minimax optimal convergence rates for numerous classes of stochastic convex optimization problems are well characterized, where the majority of results utilize iterate averaged stochastic gradient descent (SGD) with polynomially decaying step sizes. In contrast, the behavior of SGD's final iterate has received much less attention despite the widespread use in practice. Motivated by this observation, this work provides a detailed study of the following question: what rate is achievable using the final iterate of SGD for the streaming least squares regression problem with and without strong convexity? First, this work shows that even if the time horizon T (i.e. the number of iterations that SGD is run for) is known in advance, the behavior of SGD's final iterate with any polynomially decaying learning rate scheme is highly sub-optimal compared to the statistical minimax rate (by a condition number factor in the strongly convex case and a factor of $\sqrt{T}$ in the non-strongly convex case). In contrast, this paper shows that Step Decay schedules, which cut the learning rate by a constant factor every constant number of epochs (i.e., the learning rate decays geometrically) offer significant improvements over any polynomially decaying step size schedule. In particular, the behavior of the final iterate with step decay schedules is off from the statistical minimax rate by only log factors (in the condition number for the strongly convex case, and in T in the non-strongly convex case). Finally, in stark contrast to the known horizon case, this paper shows that the anytime (i.e. the limiting) behavior of SGD's final iterate is poor (in that it queries iterates with highly sub-optimal function value infinitely often, i.e. in a limsup sense) irrespective of the step size scheme employed. These results demonstrate the subtlety in establishing optimal learning rate schedules (for the final iterate) for stochastic gradient procedures in fixed time horizon settings.
Fine-Grained Analysis of Stability and Generalization for Modern Meta Learning Algorithms
The support/query episodic training strategy has been widely applied in modern meta learning algorithms. Supposing the $n$ training episodes and the test episodes are sampled independently from the same environment, previous work has derived a generalization bound of $O(1/\sqrt{n})$ for smooth non-convex functions via algorithmic stability analysis. In this paper, we provide fine-grained analysis of stability and generalization for modern meta learning algorithms by considering more general situations. Firstly, we develop matching lower and upper stability bounds for meta learning algorithms with two types of loss functions: (1) nonsmooth convex functions with $\alpha$-H{\o}lder continuous subgradients $(\alpha \in [0,1))$; (2) smooth (including convex and non-convex) functions. Our tight stability bounds show that, in the nonsmooth convex case, meta learning algorithms can be inherently less stable than in the smooth convex case.
An Efficient Pruning Algorithm for Robust Isotonic Regression
We study a generalization of the classic isotonic regression problem where we allow separable nonconvex objective functions, focusing on the case of estimators used in robust regression. A simple dynamic programming approach allows us to solve this problem to within ε-accuracy (of the global minimum) in time linear in 1/ε and the dimension. We can combine techniques from the convex case with branch-and-bound ideas to form a new algorithm for this problem that naturally exploits the shape of the objective function. Our algorithm achieves the best bounds for both the general nonconvex and convex case (linear in log (1/ε)), while performing much faster in practice than a straightforward dynamic programming approach, especially as the desired accuracy increases.
- North America > United States > Wisconsin > Dane County > Madison (0.04)
- North America > Canada > Quebec > Montreal (0.04)
- Europe > Sweden > Östergötland County > Linköping (0.04)
- Asia > Japan > Honshū > Chūbu > Nagano Prefecture > Nagano (0.04)
- Asia > China (0.04)
- North America > United States > Virginia (0.04)
- North America > United States > California > Yolo County > Davis (0.04)
- Asia > Nepal (0.04)
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (1.00)
- Health & Medicine (0.67)
- Social Sector (0.46)
- Asia > China > Jiangsu Province > Nanjing (0.04)
- Asia > China > Zhejiang Province > Ningbo (0.04)
- North America > United States > Wisconsin > Dane County > Madison (0.04)
- North America > Canada > Quebec > Montreal (0.04)
- Europe > Sweden > Östergötland County > Linköping (0.04)
- Asia > Japan > Honshū > Chūbu > Nagano Prefecture > Nagano (0.04)