AITopics | stochastic gradient algorithm

Stochastic Gradients under Nuisances

Neural Information Processing SystemsJun-12-2026, 23:18:01 GMT

Stochastic gradient optimization is the dominant learning paradigm for a variety of scenarios, from classical supervised learning to modern self-supervised learning. We consider stochastic gradient algorithms for learning problems whose objectives rely on unknown nuisance parameters, and establish non-asymptotic convergence guarantees. Our results show that, while the presence of a nuisance can alter the optimum and upset the optimization trajectory, the classical stochastic gradient algorithm may still converge under appropriate conditions, such as Neyman orthogonality. Moreover, even when Neyman orthogonality is not satisfied, we also show that an algorithm variant with approximately orthogonalized updates (with an approximately orthogonalized gradient oracle) may achieve similar convergence rates. Examples from orthogonal statistical learning/double machine learning and causal inference are discussed.

artificial intelligence, machine learning, proceedings, (4 more...)

Neural Information Processing Systems

Genre: Research Report > New Finding (0.62)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

Add feedback

A Simple Proximal Stochastic Gradient Method for Nonsmooth Nonconvex Optimization

Neural Information Processing SystemsMar-17-2026, 02:04:45 GMT

We analyze stochastic gradient algorithms for optimizing nonconvex, nonsmooth finite-sum problems. In particular, the objective function is given by the summation of a differentiable (possibly nonconvex) component, together with a possibly non-differentiable but convex component. We propose a proximal stochastic gradient algorithm based on variance reduction, called ProxSVRG+. Our main contribution lies in the analysis of ProxSVRG+. It recovers several existing convergence results and improves/generalizes them (in terms of the number of stochastic gradient oracle calls and proximal oracle calls). In particular, ProxSVRG+ generalizes the best results given by the SCSG algorithm, recently proposed by [Lei et al., NIPS'17] for the smooth nonconvex case. ProxSVRG+ is also more straightforward than SCSG and yields simpler analysis. Moreover, ProxSVRG+ outperforms the deterministic proximal gradient descent (ProxGD) for a wide range of minibatch sizes, which partially solves an open problem proposed in [Reddi et al., NIPS'16].

artificial intelligence, machine learning, proceedings, (8 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (1.00)

Add feedback

Zeroth-order (Non)-Convex Stochastic Optimization via Conditional Gradient and Gradient Updates

Neural Information Processing SystemsMar-16-2026, 19:54:07 GMT

In this paper, we propose and analyze zeroth-order stochastic approximation algorithms for nonconvex and convex optimization. Specifically, we propose generalizations of the conditional gradient algorithm achieving rates similar to the standard stochastic gradient algorithm using only zeroth-order information. Furthermore, under a structural sparsity assumption, we first illustrate an implicit regularization phenomenon where the standard stochastic gradient algorithm with zeroth-order information adapts to the sparsity of the problem at hand by just varying the step-size. Next, we propose a truncated stochastic gradient algorithm with zeroth-order information, whose rate of convergence depends only poly-logarithmically on the dimensionality.

artificial intelligence, machine learning, proceedings, (6 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.92)

Add feedback

A Simple Proximal Stochastic Gradient Method for Nonsmooth Nonconvex Optimization

Neural Information Processing SystemsNov-20-2025, 23:11:10 GMT

We analyze stochastic gradient algorithms for optimizing nonconvex, nonsmooth finite-sum problems. In particular, the objective function is given by the summation of a differentiable (possibly nonconvex) component, together with a possibly non-differentiable but convex component. We propose a proximal stochastic gradient algorithm based on variance reduction, called ProxSVRG+. Our main contribution lies in the analysis of ProxSVRG+. It recovers several existing convergence results and improves/generalizes them (in terms of the number of stochastic gradient oracle calls and proximal oracle calls). In particular, ProxSVRG+ generalizes the best results given by the SCSG algorithm, recently proposed by [Lei et al., NIPS'17] for the smooth nonconvex case. ProxSVRG+ is also more straightforward than SCSG and yields simpler analysis. Moreover, ProxSVRG+ outperforms the deterministic proximal gradient descent (ProxGD) for a wide range of minibatch sizes, which partially solves an open problem proposed in [Reddi et al., NIPS'16].

name change, nonsmooth nonconvex optimization, simple proximal stochastic gradient method, (7 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (1.00)

Add feedback

Zeroth-order (Non)-Convex Stochastic Optimization via Conditional Gradient and Gradient Updates

Neural Information Processing SystemsNov-20-2025, 22:02:46 GMT

In this paper, we propose and analyze zeroth-order stochastic approximation algorithms for nonconvex and convex optimization. Specifically, we propose generalizations of the conditional gradient algorithm achieving rates similar to the standard stochastic gradient algorithm using only zeroth-order information. Furthermore, under a structural sparsity assumption, we first illustrate an implicit regularization phenomenon where the standard stochastic gradient algorithm with zeroth-order information adapts to the sparsity of the problem at hand by just varying the step-size. Next, we propose a truncated stochastic gradient algorithm with zeroth-order information, whose rate of convergence depends only poly-logarithmically on the dimensionality.

conditional gradient and gradient update, gradient algorithm, stochastic optimization, (6 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.92)

Add feedback

Malliavin Calculus with Weak Derivatives for Counterfactual Stochastic Optimization

Krishnamurthy, Vikram, Snow, Luke

arXiv.org Artificial IntelligenceOct-2-2025

We study counterfactual stochastic optimization of conditional loss functionals under misspecified and noisy gradient information. The difficulty is that when the conditioning event has vanishing or zero probability, naive Monte Carlo estimators are prohibitively inefficient; kernel smoothing, though common, suffers from slow convergence. We propose a two-stage kernel-free methodology. First, we show using Malliavin calculus that the conditional loss functional of a diffusion process admits an exact representation as a Skorohod integral, yielding variance comparable to classical Monte-Carlo variance. Second, we establish that a weak derivative estimate of the conditional loss functional with respect to model parameters can be evaluated with constant variance, in contrast to the widely used score function method whose variance grows linearly in the sample path length. Together, these results yield an efficient framework for counterfactual conditional stochastic gradient algorithms in rare-event regimes.

artificial intelligence, estimator, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2510.00297

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)

Add feedback

Inverse Reinforcement Learning using Revealed Preferences and Passive Stochastic Optimization

Krishnamurthy, Vikram

arXiv.org Artificial IntelligenceJul-8-2025

This monograph, spanning three chapters, explores Inverse Reinforcement Learning (IRL). The first two chapters view inverse reinforcement learning (IRL) through the lens of revealed preferences from microeconomics while the third chapter studies adaptive IRL via Langevin dynamics stochastic gradient algorithms. Chapter uses classical revealed preference theory (Afriat's theorem and extensions) to identify constrained utility maximizers based on observed agent actions. This allows for the reconstruction of set-valued estimates of an agent's utility. We illustrate this procedure by identifying the presence of a cognitive radar and reconstructing its utility function. The chapter also addresses the construction of a statistical detector for utility maximization behavior when agent actions are corrupted by noise. Chapter 2 studies Bayesian IRL. It investigates how an analyst can determine if an observed agent is a rationally inattentive Bayesian utility maximizer (i.e., simultaneously optimizing its utility and observation likelihood). The chapter discusses inverse stopping-time problems, focusing on reconstructing the continuation and stopping costs of a Bayesian agent operating over a random horizon. We then apply this IRL methodology to identify the presence of a Bayes-optimal sequential detector. Additionally, Chapter 2 provides a concise overview of discrete choice models, inverse Bayesian filtering, and inverse stochastic gradient algorithms for adaptive IRL. Finally, Chapter 3 introduces an adaptive IRL approach utilizing passive Langevin dynamics. This method aims to track time-varying utility functions given noisy and misspecified gradients. In essence, the adaptive IRL algorithms presented in Chapter 3 can be conceptualized as inverse stochastic gradient algorithms, as they learn the utility function in real-time while a stochastic gradient algorithm is in operation.

artificial intelligence, machine learning, reinforcement learning, (20 more...)

arXiv.org Artificial Intelligence

2507.04396

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
North America > United States > New York > Tompkins County > Ithaca (0.04)
North America > United States > Illinois > Cook County > Chicago (0.04)
(2 more...)

Genre: Summary/Review (1.00)

Industry:

Information Technology > Security & Privacy (0.67)
Telecommunications (0.67)
Information Technology > Services (0.45)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)
(2 more...)

Add feedback

Online Learning Algorithms in Hilbert Spaces with $\beta-$ and $\phi-$Mixing Sequences

Roy, Priyanka, Saminger-Platz, Susanne

arXiv.org Machine LearningFeb-5-2025

In this paper, we study an online algorithm in a reproducing kernel Hilbert spaces (RKHS) based on a class of dependent processes, called the mixing process. For such a process, the degree of dependence is measured by various mixing coefficients. As a representative example, we analyze a strictly stationary Markov chain, where the dependence structure is characterized by the $\beta-$ and $\phi-$mixing coefficients. For these dependent samples, we derive nearly optimal convergence rates. Our findings extend existing error bounds for i.i.d. observations, demonstrating that the i.i.d. case is a special instance of our framework. Moreover, we explicitly account for an additional factor introduced by the dependence structure in the Markov chain.

artificial intelligence, machine learning, markov chain, (16 more...)

arXiv.org Machine Learning

2502.03551

Country:

North America > United States > New York (0.04)
Europe > Austria > Upper Austria > Linz (0.04)
North America > United States > Rhode Island > Providence County > Providence (0.04)
(2 more...)

Genre: Research Report (0.84)

Industry: Education > Educational Setting > Online (0.64)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.59)

Add feedback

Upper Bounds for Learning in Reproducing Kernel Hilbert Spaces for Non IID Samples

Roy, Priyanka, Saminger-Platz, Susanne

arXiv.org Machine LearningJan-2-2025

In this paper, we study a Markov chain-based stochastic gradient algorithm in general Hilbert spaces, aiming to approximate the optimal solution of a quadratic loss function. We establish probabilistic upper bounds on its convergence. We further extend these results to an online regularized learning algorithm in reproducing kernel Hilbert spaces, where the samples are drawn along a Markov chain trajectory hence the samples are of the non i.i.d.

algorithm, markov chain, markov chain trajectory, (14 more...)

arXiv.org Machine Learning

2410.08361

Country:

North America > United States > New York (0.04)
Europe > Austria > Upper Austria > Linz (0.04)
North America > United States > Rhode Island > Providence County > Providence (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report (0.64)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.35)

Add feedback

Finite Sample and Large Deviations Analysis of Stochastic Gradient Algorithm with Correlated Noise

Yin, George, Krishnamurthy, Vikram

arXiv.org Artificial IntelligenceOct-10-2024

This paper focuses on finite sample analysis for stochastic gradient algorithms. The motivation stems from a vast varieties of applications. In particular, the recent advances on stochastic optimization in conjunction with machine learning have opened up new domains. A particular emphasis of the learning community requires us taking a careful look at of the finite sample analysis. Well, it is well known that stochastic gradient algorithms or stochastic approximation algorithms are normally concentrated on dealing with asymptotic properties of the recursive algorithms. However, the learning community placed more effort for carrying out analysis of finite sample properties of the recursive algorithms; see for example,... and references therein.

artificial intelligence, machine learning, stochastic gradient algorithm, (14 more...)

arXiv.org Artificial Intelligence

2410.08449

Country:

North America > United States > New York (0.04)
North America > United States > Connecticut (0.04)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Mathematical & Statistical Methods (0.82)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.82)

Add feedback

Filters

Collaborating Authors

stochastic gradient algorithm

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

Stochastic Gradients under Nuisances

A Simple Proximal Stochastic Gradient Method for Nonsmooth Nonconvex Optimization

Zeroth-order (Non)-Convex Stochastic Optimization via Conditional Gradient and Gradient Updates

A Simple Proximal Stochastic Gradient Method for Nonsmooth Nonconvex Optimization

Zeroth-order (Non)-Convex Stochastic Optimization via Conditional Gradient and Gradient Updates

Malliavin Calculus with Weak Derivatives for Counterfactual Stochastic Optimization

Inverse Reinforcement Learning using Revealed Preferences and Passive Stochastic Optimization

Online Learning Algorithms in Hilbert Spaces with $\beta-$ and $\phi-$Mixing Sequences

Upper Bounds for Learning in Reproducing Kernel Hilbert Spaces for Non IID Samples

Finite Sample and Large Deviations Analysis of Stochastic Gradient Algorithm with Correlated Noise