AITopics | Lin Xiao

The use of momentum in stochastic gradient methods has become a widespread practice in machine learning. Different variants of momentum, including heavyball momentum, Nesterov's accelerated gradient (NAG), and quasi-hyperbolic momentum (QHM), have demonstrated success on various tasks. Despite these empirical successes, there is a lack of clear understanding of how the momentum parameters affect convergence and various performance measures of different algorithms. In this paper, we use the general formulation of QHM to give a unified analysis of several popular algorithms, covering their asymptotic convergence conditions, stability regions, and properties of their stationary distributions. In addition, by combining the results on convergence rates and stationary distributions, we obtain sometimes counter-intuitive practical guidelines for setting the learning rate and momentum parameters.

artificial intelligence, convergence rate, machine learning, (17 more...)

Neural Information Processing Systems

Country:

Europe (0.68)
North America > United States (0.46)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.73)
Information Technology > Artificial Intelligence > Representation & Reasoning > Mathematical & Statistical Methods (0.72)

Add feedback

A Stochastic Composite Gradient Method with Incremental Variance Reduction

Junyu Zhang, Lin Xiao

Neural Information Processing SystemsMar-26-2025, 11:49:11 GMT

We consider the problem of minimizing the composition of a smooth (nonconvex) function and a smooth vector mapping, where the inner mapping is in the form of an expectation over some random variable or a finite sum. We propose a stochastic composite gradient method that employs an incremental variance-reduced estimator for both the inner vector mapping and its Jacobian. We show that this method achieves the same orders of complexity as the best known first-order methods for minimizing expected-value and finite-sum nonconvex functions, despite the additional outer composition which renders the composite gradient estimator biased. This finding enables a much broader range of applications in machine learning to benefit from the low complexity of incremental variance-reduction methods.

artificial intelligence, machine learning, optimization, (16 more...)

Neural Information Processing Systems

Country: North America > United States > Minnesota (0.28)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Learning SMaLL Predictors

Vikas Garg, Ofer Dekel, Lin Xiao

Neural Information Processing SystemsMar-23-2025, 08:18:31 GMT

Neural Information Processing Systems http://nips.cc/

artificial intelligence, dataset, machine learning, (19 more...)

Neural Information Processing Systems

Country: North America (0.46)

Genre: Research Report (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

Add feedback

A Stochastic Composite Gradient Method with Incremental Variance Reduction

Junyu Zhang, Lin Xiao

Neural Information Processing SystemsJan-26-2025, 06:45:02 GMT

We consider the problem of minimizing the composition of a smooth (nonconvex) function and a smooth vector mapping, where the inner mapping is in the form of an expectation over some random variable or a finite sum. We propose a stochastic composite gradient method that employs an incremental variance-reduced estimator for both the inner vector mapping and its Jacobian. We show that this method achieves the same orders of complexity as the best known first-order methods for minimizing expected-value and finite-sum nonconvex functions, despite the additional outer composition which renders the composite gradient estimator biased. This finding enables a much broader range of applications in machine learning to benefit from the low complexity of incremental variance-reduction methods.

artificial intelligence, machine learning, optimization, (15 more...)

Neural Information Processing Systems

Country: North America > United States > Minnesota (0.28)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Understanding the Role of Momentum in Stochastic Gradient Methods

Igor Gitman, Hunter Lang, Pengchuan Zhang, Lin Xiao

Neural Information Processing SystemsJan-23-2025, 14:13:16 GMT

The use of momentum in stochastic gradient methods has become a widespread practice in machine learning. Different variants of momentum, including heavyball momentum, Nesterov's accelerated gradient (NAG), and quasi-hyperbolic momentum (QHM), have demonstrated success on various tasks. Despite these empirical successes, there is a lack of clear understanding of how the momentum parameters affect convergence and various performance measures of different algorithms. In this paper, we use the general formulation of QHM to give a unified analysis of several popular algorithms, covering their asymptotic convergence conditions, stability regions, and properties of their stationary distributions. In addition, by combining the results on convergence rates and stationary distributions, we obtain sometimes counter-intuitive practical guidelines for setting the learning rate and momentum parameters.

artificial intelligence, convergence rate, machine learning, (17 more...)

Neural Information Processing Systems

Country:

Europe (0.68)
North America > United States (0.46)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.73)
Information Technology > Artificial Intelligence > Representation & Reasoning > Mathematical & Statistical Methods (0.72)

Add feedback

Learning SMaLL Predictors

Vikas Garg, Ofer Dekel, Lin Xiao

Neural Information Processing SystemsOct-7-2024, 03:58:48 GMT

We introduce a new framework for learning in severely resource-constrained settings. Our technique delicately amalgamates the representational richness of multiple linear predictors with the sparsity of Boolean relaxations, and thereby yields classifiers that are compact, interpretable, and accurate. We provide a rigorous formalism of the learning problem, and establish fast convergence of the ensuing algorithm via relaxation to a minimax saddle point objective.

artificial intelligence, dataset, machine learning, (19 more...)

Neural Information Processing Systems

Country: North America (0.46)

Genre: Research Report (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

Add feedback

Q-LDA: Uncovering Latent Patterns in Text-based Sequential Decision Processes

Jianshu Chen, Chong Wang, Lin Xiao, Ji He, Lihong Li, Li Deng

Neural Information Processing SystemsOct-4-2024, 04:32:07 GMT

In sequential decision making, it is often important and useful for end users to understand the underlying patterns or causes that lead to the corresponding decisions. However, typical deep reinforcement learning algorithms seldom provide such information due to their black-box nature. In this paper, we present a probabilistic model, Q-LDA, to uncover latent patterns in text-based sequential decision processes. The model can be understood as a variant of latent topic models that are tailored to maximize total rewards; we further draw an interesting connection between an approximate maximum-likelihood estimation of Q-LDA and the celebrated Q-learning algorithm. We demonstrate in the text-game domain that our proposed method not only provides a viable mechanism to uncover latent patterns in decision processes, but also obtains state-of-the-art rewards in these games.

artificial intelligence, machine learning, reinforcement learning, (19 more...)

Neural Information Processing Systems

Country: North America > United States (0.68)

Genre: Workflow (0.46)

Industry: Leisure & Entertainment > Games (0.66)