Goto

Collaborating Authors

 Optimization


On Convergence of Adam for Stochastic Optimization under Relaxed Assumptions

Neural Information Processing Systems

In this paper, we study Adam in non-convex smooth scenarios with potential unbounded gradients and affine variance noise. We consider a general noise model which governs affine variance noise, bounded noise, and sub-Gaussian noise. We show that Adam with a specific hyper-parameter setup can find a stationary point with a O (1 / T) rate in high probability under this general noise model where T denotes total number iterations, matching the lower rate of stochastic first-order algorithms up to logarithm factors.









Breaking Long-Tailed Learning Bottlenecks: A Controllable Paradigm with Hypernetwork-Generated Diverse Experts

Neural Information Processing Systems

We generate a set of diverse expert models via hypernetworks to cover all possible distribution scenarios, and optimize the model ensemble to adapt to any test distribution. Crucially, in any distribution scenario, we can flexibly output a dedicated model solution that matches the user's preference.