Goto

Collaborating Authors

 optimism


Asymptotic Optimism for Tensor Regression Models with Applications to Neural Network Compression

Shi, Haoming, Chi, Eric C., Luo, Hengrui

arXiv.org Machine Learning

We study rank selection for low-rank tensor regression under random covariates design. Under a Gaussian random-design model and some mild conditions, we derive population expressions for the expected training-testing discrepancy (optimism) for both CP and Tucker decomposition. We further demonstrate that the optimism is minimized at the true tensor rank for both CP and Tucker regression. This yields a prediction-oriented rank-selection rule that aligns with cross-validation and extends naturally to tensor-model averaging. We also discuss conditions under which under- or over-ranked models may appear preferable, thereby clarifying the scope of the method. Finally, we showcase its practical utility on a real-world image regression task and extend its application to tensor-based compression of neural network, highlighting its potential for model selection in deep learning.



Optimistic Meta-Gradients

Neural Information Processing Systems

We study the connection between gradient-based meta-learning and convex optimisation. We observe that gradient descent with momentum is a special case of meta-gradients, and building on recent results in optimisation, we prove convergence rates for meta-learning in the single task setting.


Universal Online Learning with Gradient Variations: A Multi-layer Online Ensemble Approach

Neural Information Processing Systems

In this paper, we propose an online convex optimization approach with two different levels of adaptivity. On a higher level, our approach is agnostic to the unknown types and curvatures of the online functions, while at a lower level, it can exploit the unknown niceness of the environments and attain problem-dependent guarantees.





Gradient-Variation Online Learning under Generalized Smoothness

Neural Information Processing Systems

Gradient-variation online learning aims to achieve regret guarantees that scale with variations in the gradients of online functions, which is crucial for attaining fast convergence in games and robustness in stochastic o ptimization, hence receiving increased attention. Existing results often req uire the smoothness condition by imposing a fixed bound on gradient Lipschitzness, w hich may be unrealistic in practice. Recent efforts in neural network optim ization suggest a generalized smoothness condition, allowing smoothness to correlate with gradient norms. In this paper, we systematically study gradient-var iation online learning under generalized smoothness. We extend the classic optimi stic mirror descent algorithm to derive gradient-variation regret by analyzin g stability over the optimization trajectory and exploiting smoothness locally. Th en, we explore universal online learning, designing a single algorithm with the optimal gradient-va riation regrets for convex and strongly convex functions simultane ously, without requiring prior knowledge of curvature. This algorithm adopts a tw o-layer structure with a meta-algorithm running over a group of base-learners . To ensure favorable guarantees, we design a new Lipschitz-adaptive meta-a lgorithm, capable of handling potentially unbounded gradients while ensuring a second-order bound to effectively ensemble the base-learners. Finally, we provi de the applications for fast-rate convergence in games and stochastic extended adv ersarial optimization.


MobILE: Model-BasedImitationLearning From ObservationAlone

Neural Information Processing Systems

Weprovide aunified analysis for MobILE, and demonstrate that MobILE enjoys strong performance guarantees for classes of MDP dynamics that satisfy certain well studied notions of structural complexity. We also show that the ILFO problem isstrictly harder than the standard IL problem by presenting an exponential sample complexity separation between ILand ILFO.


MobILE: Model-BasedImitationLearning From ObservationAlone

Neural Information Processing Systems

Weprovide aunified analysis for MobILE, and demonstrate that MobILE enjoys strong performance guarantees for classes of MDP dynamics that satisfy certain well studied notions of structural complexity. We also show that the ILFO problem isstrictly harder than the standard IL problem by presenting an exponential sample complexity separation between ILand ILFO.