AITopics | psgd

Collaborating Authors

psgd

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

eb1e78328c46506b46a4ac4a1e378b91-Paper.pdf

Neural Information Processing SystemsFeb-19-2026, 08:35:12 GMT

gradient descent, psgd, quantization, (12 more...)

Neural Information Processing Systems

Country:

Asia > South Korea > Seoul > Seoul (0.05)
North America > Canada (0.04)
Asia > Middle East > Jordan (0.04)
Africa > Middle East > Tunisia > Ben Arous Governorate > Ben Arous (0.04)

Genre: Research Report (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Add feedback

EscapingSaddle-PointFasterunder Interpolation-likeConditions

Neural Information Processing SystemsFeb-9-2026, 09:04:22 GMT

One of the fundamental aspects of over-parametrized models is that they are capable of interpolating the training data. We show that, under interpolation-like assumptions satisfied by the stochastic gradients in an overparametrization setting, thefirst-order oracle complexityofPerturbed Stochastic Gradient Descent (PSGD) algorithm toreach an -local-minimizer,matches the corresponding deterministic rateof O(1/2).

artificial intelligence, lemmaa, machine learning, (17 more...)

Neural Information Processing Systems

Country: North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.75)
Information Technology > Artificial Intelligence > Representation & Reasoning > Mathematical & Statistical Methods (0.54)

Add feedback

92a08bf918f44ccd961477be30023da1-Paper.pdf

Neural Information Processing SystemsFeb-9-2026, 09:04:16 GMT

algorithm, arxiv preprint arxiv, assumption, (13 more...)

Neural Information Processing Systems

Country:

Asia > Middle East > Jordan (0.05)
North America > United States > California > Yolo County > Davis (0.04)
North America > Canada (0.04)

Industry: Government (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.75)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.71)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.68)

Add feedback

Escaping Saddle-Point Faster under Interpolation-like Conditions

Neural Information Processing SystemsDec-24-2025, 07:26:34 GMT

In this paper, we show that under over-parametrization several standard stochastic optimization algorithms escape saddle-points and converge to local-minimizers much faster. One of the fundamental aspects of over-parametrized models is that they are capable of interpolating the training data. We show that, under interpolation-like assumptions satisfied by the stochastic gradients in an over-parametrization setting, the first-order oracle complexity of Perturbed Stochastic Gradient Descent (PSGD) algorithm to reach an $\epsilon$-local-minimizer, matches the corresponding deterministic rate of $O(1/\epsilon^{2})$. We next analyze Stochastic Cubic-Regularized Newton (SCRN) algorithm under interpolation-like conditions, and show that the oracle complexity to reach an $\epsilon$-local-minimizer under interpolation-like conditions, is $O(1/\epsilon^{2.5})$. While this obtained complexity is better than the corresponding complexity of either PSGD, or SCRN without interpolation-like assumptions, it does not match the rate of $O(1/\epsilon^{1.5})$

complexity, interpolation-like assumption, interpolation-like condition, (8 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.83)

Add feedback

Stochastic Primal-Dual Method for Empirical Risk Minimization with O(1) Per-Iteration Complexity

Conghui Tan, Tong Zhang, Shiqian Ma, Ji Liu

Neural Information Processing SystemsNov-20-2025, 14:11:11 GMT

Neural Information Processing Systems http://nips.cc/

algorithm, artificial intelligence, machine learning, (17 more...)

Neural Information Processing Systems

Country:

North America > United States > Massachusetts > Suffolk County > Boston (0.04)
North America > United States > California > Yolo County > Davis (0.04)
North America > Canada > Quebec > Montreal (0.04)
Asia > China > Hong Kong (0.04)

Genre: Research Report (0.47)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.72)

Add feedback

Stochastic Optimization in Semi-Discrete Optimal Transport: Convergence Analysis and Minimax Rate

Genans, Ferdinand, Godichon-Baggioni, Antoine, Vialard, François-Xavier, Wintenberger, Olivier

arXiv.org Machine LearningOct-30-2025

We investigate the semi-discrete Optimal Transport (OT) problem, where a continuous source measure $μ$ is transported to a discrete target measure $ν$, with particular attention to the OT map approximation. In this setting, Stochastic Gradient Descent (SGD) based solvers have demonstrated strong empirical performance in recent machine learning applications, yet their theoretical guarantee to approximate the OT map is an open question. In this work, we answer it positively by providing both computational and statistical convergence guarantees of SGD. Specifically, we show that SGD methods can estimate the OT map with a minimax convergence rate of $\mathcal{O}(1/\sqrt{n})$, where $n$ is the number of samples drawn from $μ$. To establish this result, we study the averaged projected SGD algorithm, and identify a suitable projection set that contains a minimizer of the objective, even when the source measure is not compactly supported. Our analysis holds under mild assumptions on the source measure and applies to MTW cost functions,whic include $\|\cdot\|^p$ for $p \in (1, \infty)$. We finally provide numerical evidence for our theoretical results.

artificial intelligence, convergence rate, machine learning, (16 more...)

arXiv.org Machine Learning

2510.25287

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.54)

Add feedback

Monitoring State Transitions in Markovian Systems with Sampling Cost

Saurav, Kumar, Shroff, Ness B., Liang, Yingbin

arXiv.org Machine LearningOct-28-2025

We consider a node-monitor pair, where the node's state varies with time. The monitor needs to track the node's state at all times; however, there is a fixed cost for each state query. So the monitor may instead predict the state using time-series forecasting methods, including time-series foundation models (TSFMs), and query only when prediction uncertainty is high. Since query decisions influence prediction accuracy, determining when to query is nontrivial. A natural approach is a greedy policy that predicts when the expected prediction loss is below the query cost and queries otherwise. We analyze this policy in a Markovian setting, where the optimal (OPT) strategy is a state-dependent threshold policy minimizing the time-averaged sum of query cost and prediction losses. We show that, in general, the greedy policy is suboptimal and can have an unbounded competitive ratio, but under common conditions such as identically distributed transition probabilities, it performs close to OPT. For the case of unknown transition probabilities, we further propose a projected stochastic gradient descent (PSGD)-based learning variant of the greedy policy, which achieves a favorable predict-query tradeoff with improved computational efficiency compared to OPT.

artificial intelligence, data mining, machine learning, (17 more...)

arXiv.org Machine Learning

2510.22327

Country: North America > United States > Ohio (0.14)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Data Science > Data Mining (0.90)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.54)

Add feedback

Position-based Scaled Gradient for Model Quantization and Pruning - Appendix

Neural Information Processing SystemsAug-22-2025, 01:03:05 GMT

In this experiment, we only quantize the weights, not the activations, to compare the performance degradation as weight bit-width decreases. The mean squared errors (MSE) of the weights across different bit-widths are also reported. In Fig. A1, we display the full-precision weight distributions of the PSGD models and compare them Four random layers of each model are shown column-wise. The first row displays the model trained with SGD and L2 weight decay. This is also reported in Figure 1 of the original paper.

epoch, experiment, weight decay, (13 more...)

Neural Information Processing Systems

Country:

Asia > South Korea > Seoul > Seoul (0.06)
North America > Canada (0.04)
Europe > Netherlands > North Holland > Amsterdam (0.04)
Asia > Middle East > Jordan (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.69)

Add feedback

Position-based Scaled Gradient for Model Quantization and Pruning

Neural Information Processing SystemsAug-17-2025, 03:16:19 GMT

We propose the position-based scaled gradient (PSG) that scales the gradient depending on the position of a weight vector to make it more compression-friendly.

artificial intelligence, machine learning, psgd, (15 more...)

Neural Information Processing Systems

Country:

Asia > South Korea > Seoul > Seoul (0.05)
North America > Canada (0.04)
Asia > Middle East > Jordan (0.04)
Africa > Middle East > Tunisia > Ben Arous Governorate > Ben Arous (0.04)

Genre: Research Report (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Add feedback

Figure 5: Loss surface using [35]; SGD (top) and PSGD (bottom)

Neural Information Processing SystemsAug-17-2025, 03:16:07 GMT

We thank the reviewers for their positive and constructive feedbacks. Note that our PSGD has a similar accuracy with the SGD-trained model at FP . A similar rationale is given in Sec. Note that at lower bits such as W2A8, we attain 62.7% accuracy, while LAPQ has 1.3% accuracy. The detailed definition and proof are in [38].

artificial intelligence, machine learning, psgd, (14 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.48)

Add feedback