AITopics | stochastic

8b2fc235787852ead92da2268cd9e90c-Paper-Conference.pdf

Neural Information Processing SystemsFeb-10-2026, 15:02:46 GMT

In recent years, deep learning has become a staple solution to different tasks, such as computer vision,bio-informatics,speechrecognition,andmanymore.

artificial intelligence, machine learning, neuron, (18 more...)

Neural Information Processing Systems

Country:

Europe > Italy > Tuscany > Florence (0.04)
Europe > Italy > Piedmont > Turin Province > Turin (0.04)
Europe > France (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)

Add feedback

TowardsOptimalCommunicationComplexityin Distributed Non-ConvexOptimization

Neural Information Processing SystemsFeb-9-2026, 02:31:41 GMT

Even whenFm's are not identical, for high levels of

artificial intelligence, machine learning, mkr 2 3, (16 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.69)

Add feedback

Implicit Bias of (Stochastic) Gradient Descent for Rank-1 Linear Neural Network

Neural Information Processing SystemsDec-26-2025, 14:29:32 GMT

Studying the implicit bias of gradient descent (GD) and stochastic gradient descent (SGD) is critical to unveil the underlying mechanism of deep learning. Unfortunately, even for standard linear networks in regression setting, a comprehensive characterization of the implicit bias is still an open problem. This paper proposes to investigate a new proxy model of standard linear network, rank-1 linear network, where each weight matrix is parameterized as a rank-1 form. For over-parameterized regression problem, we precisely analyze the implicit bias of GD and SGD---by identifying a "potential" function such that GD converges to its minimizer constrained by zero training error (i.e., interpolation solution), and further characterizing the role of the noise introduced by SGD in perturbing the form of this potential. Our results explicitly connect the depth of the network and the initialization with the implicit bias of GD and SGD. Furthermore, we emphasize a new implicit bias of SGD jointly induced by stochasticity and over-parameterization, which can reduce the dependence of the SGD's solution on the initialization. Our findings regarding the implicit bias are different from that of a recently popular model, the diagonal linear network. We highlight that the induced bias of our rank-1 model is more consistent with standard linear network while the diagonal one is not. This suggests that the proposed rank-1 linear network might be a plausible proxy for standard linear net.

gradient descent, implicit bias, linear network, (9 more...)

Neural Information Processing Systems

Genre: Research Report > New Finding (0.59)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.88)

Add feedback

Autonomous Capability Assessment of Sequential Decision-Making Systems in Stochastic Settings

Neural Information Processing SystemsDec-26-2025, 13:20:36 GMT

It is essential for users to understand what their AI systems can and can't do in order to use them safely. However, the problem of enabling users to assess AI systems with sequential decision-making (SDM) capabilities is relatively understudied. This paper presents a new approach for modeling the capabilities of black-box AI systems that can plan and act, along with the possible effects and requirements for executing those capabilities in stochastic settings. We present an active-learning approach that can effectively interact with a black-box SDM system and learn an interpretable probabilistic model describing its capabilities. Theoretical analysis of the approach identifies the conditions under which the learning process is guaranteed to converge to the correct model of the agent; empirical evaluations on different agents and simulated scenarios show that this approach is few-shot generalizable and can effectively describe the capabilities of arbitrary black-box SDM agents in a sample-efficient manner.

autonomous capability assessment, name change, sequential decision-making system, (4 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Stochastic Distributed Optimization under Average Second-order Similarity: Algorithms and Analysis

Neural Information Processing SystemsDec-23-2025, 18:47:02 GMT

We study finite-sum distributed optimization problems involving a master node and $n-1$ local nodes under the popular $\delta$-similarity and $\mu$-strong convexity conditions. We propose two new algorithms, SVRS and AccSVRS, motivated by previous works. The non-accelerated SVRS method combines the techniques of gradient sliding and variance reduction and achieves a better communication complexity of $\tilde{\mathcal{O}}(n {+} \sqrt{n}\delta/\mu)$ compared to existing non-accelerated algorithms.

average second-order similarity, name change, optimization, (10 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.65)

Add feedback

Armijo Line-search Makes (Stochastic) Gradient Descent Go Fast

Vaswani, Sharan, Babanezhad, Reza

arXiv.org Machine LearningFeb-28-2025

Armijo line-search (Armijo-LS) is a standard method to set the step-size for gradient descent (GD). For smooth functions, Armijo-LS alleviates the need to know the global smoothness constant $L$ and adapts to the local smoothness, enabling GD to converge faster. However, existing theoretical analyses of GD with Armijo-LS (GD-LS) do not characterize this fast convergence. We show that if the objective function satisfies a certain non-uniform smoothness condition, GD-LS converges provably faster than GD with a constant $1/L$ step-size (denoted as GD(1/L)). Our results imply that for convex losses corresponding to logistic regression and multi-class classification, GD-LS can converge to the optimum at a linear rate and, hence, improve over the sublinear convergence of GD(1/L). Furthermore, for non-convex losses satisfying gradient domination (for example, those corresponding to the softmax policy gradient in RL or generalized linear models with a logistic link function), GD-LS can match the fast convergence of algorithms tailored for these specific settings. Finally, we prove that under the interpolation assumption, for convex losses, stochastic GD with a stochastic line-search can match the fast convergence of GD-LS.

armijo line-search make, convergence, gd-ls, (13 more...)

arXiv.org Machine Learning

2503.00229

Country:

Africa > Senegal > Kolda Region > Kolda (0.04)
North America > Canada > Quebec > Montreal (0.04)
Europe > Italy (0.04)

Genre: Research Report > New Finding (0.88)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (1.00)

Add feedback

Implicit Bias of (Stochastic) Gradient Descent for Rank-1 Linear Neural Network

Neural Information Processing SystemsJan-19-2025, 19:59:25 GMT

Studying the implicit bias of gradient descent (GD) and stochastic gradient descent (SGD) is critical to unveil the underlying mechanism of deep learning. Unfortunately, even for standard linear networks in regression setting, a comprehensive characterization of the implicit bias is still an open problem. This paper proposes to investigate a new proxy model of standard linear network, rank-1 linear network, where each weight matrix is parameterized as a rank-1 form. For over-parameterized regression problem, we precisely analyze the implicit bias of GD and SGD---by identifying a "potential" function such that GD converges to its minimizer constrained by zero training error (i.e., interpolation solution), and further characterizing the role of the noise introduced by SGD in perturbing the form of this potential. Our results explicitly connect the depth of the network and the initialization with the implicit bias of GD and SGD.

implicit bias, linear network, rank-1 linear neural network, (7 more...)

Neural Information Processing Systems

Genre: Research Report > New Finding (0.41)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (1.00)

Add feedback

Autonomous Capability Assessment of Sequential Decision-Making Systems in Stochastic Settings

Neural Information Processing SystemsJan-19-2025, 18:37:07 GMT

It is essential for users to understand what their AI systems can and can't do in order to use them safely. However, the problem of enabling users to assess AI systems with sequential decision-making (SDM) capabilities is relatively understudied. This paper presents a new approach for modeling the capabilities of black-box AI systems that can plan and act, along with the possible effects and requirements for executing those capabilities in stochastic settings. We present an active-learning approach that can effectively interact with a black-box SDM system and learn an interpretable probabilistic model describing its capabilities. Theoretical analysis of the approach identifies the conditions under which the learning process is guaranteed to converge to the correct model of the agent; empirical evaluations on different agents and simulated scenarios show that this approach is few-shot generalizable and can effectively describe the capabilities of arbitrary black-box SDM agents in a sample-efficient manner.

autonomous capability assessment, sequential decision-making system, stochastic, (2 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.66)

Add feedback

Stochastic normalizing flows for Effective String Theory

Caselle, Michele, Cellini, Elia, Nada, Alessandro

arXiv.org Artificial IntelligenceJan-8-2025

Effective String Theory (EST) is a powerful tool used to study confinement in pure gauge theories by modeling the confining flux tube connecting a static quark-anti-quark pair as a thin vibrating string. Recently, flow-based samplers have been applied as an efficient numerical method to study EST regularized on the lattice, opening the route to study observables previously inaccessible to standard analytical methods. Flow-based samplers are a class of algorithms based on Normalizing Flows (NFs), deep generative models recently proposed as a promising alternative to traditional Markov Chain Monte Carlo methods in lattice field theory calculations. By combining NF layers with out-of-equilibrium stochastic updates, we obtain Stochastic Normalizing Flows (SNFs), a scalable class of machine learning algorithms that can be explained in terms of stochastic thermodynamics. In this contribution, we outline EST and SNFs, and report some numerical results for the shape of the flux tube.

flux tube, gauge theory, stochastic, (15 more...)

arXiv.org Artificial Intelligence

2412.19109

Country:

North America > United States > Nebraska > Adams County > Hastings (0.04)
Europe > United Kingdom > England > Merseyside > Liverpool (0.04)
Europe > Italy > Piedmont > Turin Province > Turin (0.04)

Genre: Research Report (0.64)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.50)

Add feedback

On the Convergence of (Stochastic) Gradient Descent for Kolmogorov--Arnold Networks

Gao, Yihang, Tan, Vincent Y. F.

arXiv.org Artificial IntelligenceOct-10-2024

Kolmogorov--Arnold Networks (KANs), a recently proposed neural network architecture, have gained significant attention in the deep learning community, due to their potential as a viable alternative to multi-layer perceptrons (MLPs) and their broad applicability to various scientific tasks. Empirical investigations demonstrate that KANs optimized via stochastic gradient descent (SGD) are capable of achieving near-zero training loss in various machine learning (e.g., regression, classification, and time series forecasting, etc.) and scientific tasks (e.g., solving partial differential equations). In this paper, we provide a theoretical explanation for the empirical success by conducting a rigorous convergence analysis of gradient descent (GD) and SGD for two-layer KANs in solving both regression and physics-informed tasks. For regression problems, we establish using the neural tangent kernel perspective that GD achieves global linear convergence of the objective function when the hidden dimension of KANs is sufficiently large. We further extend these results to SGD, demonstrating a similar global convergence in expectation. Additionally, we analyze the global convergence of GD and SGD for physics-informed KANs, which unveils additional challenges due to the more complex loss structure. This is the first work establishing the global convergence guarantees for GD and SGD applied to optimize KANs and physics-informed KANs.

arnold network, convergence, gradient descent, (2 more...)

arXiv.org Artificial Intelligence

2410.08041

Genre: Research Report (0.69)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Filters

Collaborating Authors

stochastic

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

8b2fc235787852ead92da2268cd9e90c-Paper-Conference.pdf

TowardsOptimalCommunicationComplexityin Distributed Non-ConvexOptimization

Implicit Bias of (Stochastic) Gradient Descent for Rank-1 Linear Neural Network

Autonomous Capability Assessment of Sequential Decision-Making Systems in Stochastic Settings

Stochastic Distributed Optimization under Average Second-order Similarity: Algorithms and Analysis

Armijo Line-search Makes (Stochastic) Gradient Descent Go Fast

Implicit Bias of (Stochastic) Gradient Descent for Rank-1 Linear Neural Network

Autonomous Capability Assessment of Sequential Decision-Making Systems in Stochastic Settings

Stochastic normalizing flows for Effective String Theory

On the Convergence of (Stochastic) Gradient Descent for Kolmogorov--Arnold Networks