AITopics | Gradient Descent

Collaborating Authors

Gradient Descent

News Overviews Instructional Materials AI-Alerts Classics

Stable Nonconvex-Nonconcave Training via Linear Interpolation

Neural Information Processing SystemsOct-9-2025, 02:37:55 GMT

By replacing the inner optimizer in RAPP we rediscover the family of Lookahead algorithms for which we establish convergence in cohypomonotone problems even when the base optimizer is taken to be gradient descent ascent.

artificial intelligence, convergence, machine learning, (16 more...)

Neural Information Processing Systems

Country:

Asia > Middle East > Jordan (0.04)
Asia > Middle East > Israel (0.04)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.34)

Add feedback

Online Performative Gradient Descent for Learning Nash Equilibria in Decision-Dependent Games Zihan Zhu Duke University Ethan X. Fang Duke University Zhuoran Yang Yale University

Neural Information Processing SystemsOct-9-2025, 01:53:45 GMT

We focus on finding the Nash equilibrium of decision-dependent games in the bandit feedback setting.

artificial intelligence, decision-dependent game, machine learning, (17 more...)

Neural Information Processing Systems

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Europe > Switzerland > Basel-City > Basel (0.04)

Technology:

Information Technology > Game Theory (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.84)

Add feedback

On the Convergence of Black-Box Variational Inference

Neural Information Processing SystemsOct-9-2025, 00:57:13 GMT

We provide the first convergence guarantee for black-box variational inference (BBVI) with the reparameterization gradient. While preliminary investigations worked on simplified versions of BBVI ( e.g., bounded domain, bounded support, only optimizing for the scale, and such), our setup does not need any such algorithmic modifications. Our results hold for log-smooth posterior densities with and without strong log-concavity and the location-scale variational family. Notably, our analysis reveals that certain algorithm design choices commonly employed in practice, such as nonlinear parameterizations of the scale matrix, can result in suboptimal convergence rates. Fortunately, running BBVI with proximal stochastic gradient descent fixes these limitations and thus achieves the strongest known convergence guarantees. We evaluate this theoretical insight by comparing proximal SGD against other standard implementations of BBVI on large-scale Bayesian inference problems.

artificial intelligence, machine learning, variational inference, (16 more...)

Neural Information Processing Systems

Country:

Asia > Middle East > Jordan (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Asia > Bangladesh > Dhaka Division > Dhaka District > Dhaka (0.04)
(8 more...)

Genre: Research Report > New Finding (0.34)

Industry:

Leisure & Entertainment > Sports > Tennis (0.68)
Government > Regional Government > North America Government > United States Government (0.67)
Transportation > Air (0.62)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.88)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.54)

Add feedback

jmstate, a Flexible Python Package for Multi-State Joint Modeling

Laplante, Félix, Ambroise, Christophe, Kuhn, Estelle, Lemler, Sarah

arXiv.org Machine LearningOct-9-2025

Classical joint modeling approaches often rely on competing risks or recurrent event formulations to account for complex real-world processes involving evolving longitudinal markers and discrete event occurrences. However, these frameworks typically capture only limited aspects of the underlying event dynamics. Multi-state joint models offer a more flexible alternative by representing full event histories through a network of possible transitions, including recurrent cycles and terminal absorptions, all potentially influenced by longitudinal covariates. In this paper, we propose a general framework that unifies longitudinal biomarker modeling with multi-state event processes defined on arbitrary directed graphs. Our approach accommodates both Markovian and semi-Markovian transition structures, and extends classical joint models by coupling nonlinear mixed-effects longitudinal submodels with multi-state survival processes via shared latent structures. We derive the full likelihood and develop scalable inference procedures based on stochastic gradient descent. Furthermore, we introduce a dynamic prediction framework, enabling individualized risk assessments along complex state-transition trajectories. To facilitate reproducibility and dissemination, we provide an open-source Python library \texttt{jmstate} implementing the proposed methodology, available on \href{https://pypi.org/project/jmstate/}{PyPI}. Simulation experiments and a biomedical case study demonstrate the flexibility and performance of the framework in representing complex longitudinal and multi-state event dynamics. The full Python notebooks used to reproduce the experiments as well as the source code of this paper are available on \href{https://gitlab.com/felixlaplante0/jmstate-paper/}{GitLab}.

torch, trajectory, transition, (17 more...)

arXiv.org Machine Learning

2510.07128

Country:

Europe > Netherlands > South Holland > Rotterdam (0.04)
Europe > France (0.04)
North America > United States (0.04)

Genre: Research Report (0.82)

Industry:

Health & Medicine > Therapeutic Area > Neurology (0.67)
Health & Medicine > Therapeutic Area > Oncology (0.46)

Technology:

Information Technology > Software (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Mathematical & Statistical Methods (0.89)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.55)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.47)

Add feedback

Performance of machine-learning-assisted Monte Carlo in sampling from simple statistical physics models

Del Bono, Luca Maria, Ricci-Tersenghi, Federico, Zamponi, Francesco

arXiv.org Artificial IntelligenceOct-9-2025

Recent years have seen a rise in the application of machine learning techniques to aid the simulation of hard-to-sample systems that cannot be studied using traditional methods. Despite the introduction of many different architectures and procedures, a wide theoretical understanding is still lacking, with the risk of suboptimal implementations. As a first step to address this gap, we provide here a complete analytic study of the widely-used Sequential Tempering procedure applied to a shallow MADE architecture for the Curie-Weiss model. The contribution of this work is twofold: firstly, we give a description of the optimal weights and of the training under Gradient Descent optimization. Secondly, we compare what happens in Sequential Tempering with and without the addition of local Metropolis Monte Carlo steps. We are thus able to give theoretical predictions on the best procedure to apply in this case. This work establishes a clear theoretical basis for the integration of machine learning techniques into Monte Carlo sampling and optimization.

artificial intelligence, configuration, machine learning, (18 more...)

arXiv.org Artificial Intelligence

doi: 10.1103/s1rm-29zx

2505.22598

Country: Europe (0.28)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.34)

Add feedback

Federated Multi-Objective Learning

Neural Information Processing SystemsOct-8-2025, 23:24:05 GMT

Pareto stationary solution that is not improvable for all objectives without sacrificing some objectives.

algorithm, artificial intelligence, machine learning, (17 more...)

Neural Information Processing Systems

Genre: Overview (0.67)

Industry:

Information Technology (0.67)
Energy (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.32)

Add feedback

Gradient Descent with Linearly Correlated Noise: Theory and Applications to Differential Privacy Anastasia Koloskova

Neural Information Processing SystemsOct-8-2025, 21:39:05 GMT

We study gradient descent under linearly correlated noise.

artificial intelligence, machine learning, mechanism, (15 more...)

Neural Information Processing Systems

Country:

Europe > Russia (0.04)
Asia > Russia (0.04)
Asia > Middle East > Jordan (0.04)
Europe > Switzerland (0.04)

Genre: Research Report (0.46)

Industry: Information Technology > Security & Privacy (0.67)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.85)

Add feedback

Quantum speedups for stochastic optimization

Neural Information Processing SystemsOct-8-2025, 21:22:58 GMT

We consider the problem of minimizing a continuous function given given access to a natural quantum generalization of a stochastic gradient oracle. We provide two new methods for the special case of minimizing a Lipschitz convex function. Each method obtains a dimension versus accuracy trade-off which is provably unachievable classically and we prove that one method is asymptotically optimal in low-dimensional settings. Additionally, we provide quantum algorithms for computing a critical point of a smooth non-convex function at rates not known to be achievable classically. To obtain these results we build upon the quantum multivariate mean estimation result of Cornelissen et al. [25] and provide a general quantum variance reduction technique of independent interest.

algorithm, artificial intelligence, machine learning, (16 more...)

Neural Information Processing Systems

Country: