AITopics | Mathematical & Statistical Methods

Owing to their stability and convergence speed, extragradient methods have become a staple for solving large-scale saddle-point problems in machine learning. The basic premise of these algorithms is the use of an extrapolation step before performing an update; thanks to this exploration step, extragradient methods overcome many of the non-convergence issues that plague gradient descent/ascent schemes. On the other hand, as we show in this paper, running vanilla extragradient with stochastic gradients may jeopardize its convergence, even in simple bilinear models. To overcome this failure, we investigate a double stepsize extragradient algorithm where the exploration step evolves at a more aggressive time-scale compared to the update step. We show that this modification allows the method to converge even with stochastic gradients, and we derive sharp convergence rates under an error bound condition.

artificial intelligence, convergence, machine learning, (17 more...)

Neural Information Processing Systems

Country:

Europe (0.68)
North America > United States (0.28)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.88)
Information Technology > Artificial Intelligence > Representation & Reasoning > Mathematical & Statistical Methods (0.54)

Add feedback

Explore Aggressively, Update Conservatively: Stochastic Extragradient Methods with Variable Stepsize Scaling

Neural Information Processing SystemsMar-20-2025, 10:21:37 GMT

Owing to their stability and convergence speed, extragradient methods have become a staple for solving large-scale saddle-point problems in machine learning. The basic premise of these algorithms is the use of an extrapolation step before performing an update; thanks to this exploration step, extragradient methods overcome many of the non-convergence issues that plague gradient descent/ascent schemes. On the other hand, as we show in this paper, running vanilla extragradient with stochastic gradients may jeopardize its convergence, even in simple bilinear models. To overcome this failure, we investigate a double stepsize extragradient algorithm where the exploration step evolves at a more aggressive time-scale compared to the update step. We show that this modification allows the method to converge even with stochastic gradients, and we derive sharp convergence rates under an error bound condition.

artificial intelligence, convergence, machine learning, (14 more...)

Neural Information Processing Systems

Country: Europe > France (0.16)

Genre: Research Report (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.89)
Information Technology > Artificial Intelligence > Representation & Reasoning > Mathematical & Statistical Methods (0.54)

Add feedback

Higher Order Kernel Mean Embeddings to Capture Filtrations of Stochastic Processes

Neural Information Processing SystemsMar-20-2025, 05:45:36 GMT

Stochastic processes are random variables with values in some space of paths. However, reducing a stochastic process to a path-valued random variable ignores its filtration, i.e. the flow of information carried by the process through time. By conditioning the process on its filtration, we introduce a family of higher order kernel mean embeddings (KMEs) that generalizes the notion of KME and captures additional information related to the filtration. We derive empirical estimators for the associated higher order maximum mean discrepancies (MMDs) and prove consistency. We then construct a filtration-sensitive kernel two-sample test able to pick up information that gets missed by the standard MMD test. In addition, leveraging our higher order MMDs we construct a family of universal kernels on stochastic processes that allows to solve real-world calibration and optimal stopping problems in quantitative finance (such as the pricing of American options) via classical kernel-based regression methods. Finally, adapting existing tests for conditional independence to the case of stochastic processes, we design a causaldiscovery algorithm to recover the causal graph of structural dependencies among interacting bodies solely from observations of their multidimensional trajectories.

artificial intelligence, machine learning, stochastic process, (14 more...)

Neural Information Processing Systems

Industry: Banking & Finance > Trading (0.68)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Mathematical & Statistical Methods (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.48)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.46)

Add feedback

Kernel methods through the roof: handling billions of points efficiently

Neural Information Processing SystemsMar-20-2025, 03:08:51 GMT

Kernel methods provide an elegant and principled approach to nonparametric learning, but so far could hardly be used in large scale problems, since naïve implementations scale poorly with data size. Recent advances have shown the benefits of a number of algorithmic ideas, for example combining optimization, numerical linear algebra and random projections. Here, we push these efforts further to develop and test a solver that takes full advantage of GPU hardware. Towards this end, we designed a preconditioned gradient solver for kernel methods exploiting both GPU acceleration and parallelization with multiple GPUs, implementing out-of-core variants of common linear algebra operations to guarantee optimal hardware utilization. Further, we optimize the numerical precision of different operations and maximize efficiency of matrix-vector multiplications. As a result we can experimentally show dramatic speedups on datasets with billions of points, while still guaranteeing state of the art performance.

artificial intelligence, dataset, machine learning, (15 more...)

Neural Information Processing Systems

Country:

North America (0.46)
Europe (0.28)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Mathematical & Statistical Methods (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

Add feedback

8452a95c40e2b232acd9b8a8712935d7-Paper.pdf

Neural Information Processing SystemsMar-20-2025, 02:51:27 GMT

artificial intelligence, change point, machine learning, (18 more...)

Neural Information Processing Systems

Country: North America > United States (0.28)

Genre:

Overview (0.67)
Research Report > New Finding (0.46)

Industry: Banking & Finance > Trading (0.47)

Technology:

Information Technology > Data Science (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)
Information Technology > Artificial Intelligence > Representation & Reasoning > Mathematical & Statistical Methods (0.46)

Add feedback

Walking in the Shadow: A New Perspective on Descent Directions for Constrained Minimization

Neural Information Processing SystemsMar-19-2025, 21:16:24 GMT

Descent directions such as movement towards Frank-Wolfe vertices, away steps, in-face away steps and pairwise directions have been an important design consideration in conditional gradient descent (CGD) variants. In this work, we attempt to demystify the impact of movement in these directions towards attaining constrained minimizers. The best local direction of descent is the directional derivative of the projection of the gradient, which we refer to as the shadow of the gradient. We show that the continuous-time dynamics of moving in the shadow are equivalent to those of PGD however non-trivial to discretize. By projecting gradients in PGD, one not only ensures feasibility but also is able to "wrap" around the convex region. We show that Frank-Wolfe (FW) vertices in fact recover the maximal wrap one can obtain by projecting gradients, thus providing a new perspective to these steps. We also claim that the shadow steps give the best direction of descent emanating from the convex hull of all possible away-vertices. Opening up the PGD movements in terms of shadow steps gives linear convergence, dependent on the number of faces.

artificial intelligence, machine learning, projection curve, (17 more...)

Neural Information Processing Systems

Country:

North America (0.45)
Europe (0.27)

Genre: Research Report (0.67)

Industry: Government (0.45)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.48)
Information Technology > Artificial Intelligence > Representation & Reasoning > Mathematical & Statistical Methods (0.46)

Add feedback

Filters

Collaborating Authors

Mathematical & Statistical Methods

36ce475705c1dc6c50a5956cedff3d01-Supplemental-Conference.pdf

36ce475705c1dc6c50a5956cedff3d01-Paper-Conference.pdf

Convergence of Mean-field Langevin dynamics: Time-space discretization, stochastic gradient, and variance reduction Taiji Suzuki 1,2, Denny Wu

97275a23ca44226c9964043c8462be96-Supplemental.pdf

Explore Aggressively, Update Conservatively: Stochastic Extragradient Methods with Variable Stepsize Scaling

Explore Aggressively, Update Conservatively: Stochastic Extragradient Methods with Variable Stepsize Scaling

Higher Order Kernel Mean Embeddings to Capture Filtrations of Stochastic Processes

Kernel methods through the roof: handling billions of points efficiently

8452a95c40e2b232acd9b8a8712935d7-Paper.pdf

Walking in the Shadow: A New Perspective on Descent Directions for Constrained Minimization