AITopics | stochastic update

Collaborating Authors

stochastic update

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Dual-Free Stochastic Decentralized Optimization with Variance Reduction

Neural Information Processing SystemsDec-24-2025, 19:07:38 GMT

We consider the problem of training machine learning models on distributed data in a decentralized way. For finite-sum problems, fast single-machine algorithms for large datasets rely on stochastic updates combined with variance reduction. Yet, existing decentralized stochastic algorithms either do not obtain the full speedup allowed by stochastic updates, or require oracles that are more expensive than regular gradients. In this work, we introduce a Decentralized stochastic algorithm with Variance Reduction called DVR. DVR only requires computing stochastic gradients of the local functions, and is computationally as fast as a standard stochastic variance-reduced algorithms run on a $1/n$ fraction of the dataset, where $n$ is the number of nodes. To derive DVR, we use Bregman coordinate descent on a well-chosen dual problem, and obtain a dual-free algorithm using a specific Bregman divergence. We give an accelerated version of DVR based on the Catalyst framework, and illustrate its effectiveness with simulations on real data.

dual-free stochastic decentralized optimization, name change, variance reduction, (6 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Scale Up Nonlinear Component Analysis with Doubly Stochastic Gradients

Bo Xie, Yingyu Liang, Le Song

Neural Information Processing SystemsOct-2-2025, 09:11:36 GMT

Nonlinear component analysis such as kernel Principle Component Analysis (KPCA) and kernel Canonical Correlation Analysis (KCCA) are widely used in machine learning, statistics and data analysis, but they cannot scale up to big datasets. Recent attempts have employed random feature approximations to convert the problem to the primal form for linear computational complexity. However, to obtain high quality solutions, the number of random features should be the same order of magnitude as the number of data points, making such approach not directly applicable to the regime with millions of data points. We propose a simple, computationally efficient, and memory friendly algorithm based on the "doubly stochastic gradients" to scale up a range of kernel nonlinear component analysis, such as kernel PCA, CCA and SVD. Despite the non-convex nature of these problems, our method enjoys theoretical guarantees that it converges at the rate O (1 /t) to the global optimum, even for the top k eigen subspace. Unlike many alternatives, our algorithm does not require explicit orthogonaliza-tion, which is infeasible on big datasets. We demonstrate the effectiveness and scalability of our algorithm on large scale synthetic and real world datasets.

algorithm, dataset, random feature, (13 more...)

Neural Information Processing Systems

Country:

North America > United States > California > San Francisco County > San Francisco (0.14)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.86)

Add feedback

Dual-Free Stochastic Decentralized Optimization with Variance Reduction

Neural Information Processing SystemsMay-27-2025, 14:23:35 GMT

We consider the problem of training machine learning models on distributed data in a decentralized way. For finite-sum problems, fast single-machine algorithms for large datasets rely on stochastic updates combined with variance reduction. Yet, existing decentralized stochastic algorithms either do not obtain the full speedup allowed by stochastic updates, or require oracles that are more expensive than regular gradients. In this work, we introduce a Decentralized stochastic algorithm with Variance Reduction called DVR. DVR only requires computing stochastic gradients of the local functions, and is computationally as fast as a standard stochastic variance-reduced algorithms run on a 1/n fraction of the dataset, where n is the number of nodes. To derive DVR, we use Bregman coordinate descent on a well-chosen dual problem, and obtain a dual-free algorithm using a specific Bregman divergence.

artificial intelligence, machine learning, variance reduction, (5 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Stochastic Variational Inference with Tuneable Stochastic Annealing

Paisley, John, Fazelnia, Ghazal, Barr, Brian

arXiv.org Machine LearningApr-4-2025

In this paper, we exploit the observation that stochastic variational inference (SVI) is a form of annealing and present a modified SVI approach -- applicable to both large and small datasets -- that allows the amount of annealing done by SVI to be tuned. We are motivated by the fact that, in SVI, the larger the batch size the more approximately Gaussian is the intrinsic noise, but the smaller its variance. This low variance reduces the amount of annealing which is needed to escape bad local optimal solutions. We propose a simple method for achieving both goals of having larger variance noise to escape bad local optimal solutions and more data information to obtain more accurate gradient directions. The idea is to set an actual batch size, which may be the size of the data set, and a smaller effective batch size that matches the larger level of variance at this smaller batch size. The result is an approximation to the maximum entropy stochastic gradient at this variance level. We theoretically motivate our approach for the framework of conjugate exponential family models and illustrate the method empirically on the probabilistic matrix factorization collaborative filter, the Latent Dirichlet Allocation topic model, and the Gaussian mixture model.

artificial intelligence, machine learning, optimization problem, (18 more...)

arXiv.org Machine Learning

2504.03902

Country:

North America > United States (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.36)

Add feedback

Dual-Free Stochastic Decentralized Optimization with Variance Reduction

Neural Information Processing SystemsOct-11-2024, 14:26:47 GMT

We consider the problem of training machine learning models on distributed data in a decentralized way. For finite-sum problems, fast single-machine algorithms for large datasets rely on stochastic updates combined with variance reduction. Yet, existing decentralized stochastic algorithms either do not obtain the full speedup allowed by stochastic updates, or require oracles that are more expensive than regular gradients. In this work, we introduce a Decentralized stochastic algorithm with Variance Reduction called DVR. DVR only requires computing stochastic gradients of the local functions, and is computationally as fast as a standard stochastic variance-reduced algorithms run on a 1/n fraction of the dataset, where n is the number of nodes. To derive DVR, we use Bregman coordinate descent on a well-chosen dual problem, and obtain a dual-free algorithm using a specific Bregman divergence.

algorithm, dual-free stochastic decentralized optimization, variance reduction, (3 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

StoDIP: Efficient 3D MRF image reconstruction with deep image priors and stochastic iterations

Mayo, Perla, Cencini, Matteo, Pirkl, Carolin M., Menzel, Marion I., Tosetti, Michela, Menze, Bjoern H., Golbabaee, Mohammad

arXiv.org Artificial IntelligenceAug-5-2024

Magnetic Resonance Fingerprinting (MRF) is a time-efficient approach to quantitative MRI for multiparametric tissue mapping. The reconstruction of quantitative maps requires tailored algorithms for removing aliasing artefacts from the compressed sampled MRF acquisitions. Within approaches found in the literature, many focus solely on two-dimensional (2D) image reconstruction, neglecting the extension to volumetric (3D) scans despite their higher relevance and clinical value. A reason for this is that transitioning to 3D imaging without appropriate mitigations presents significant challenges, including increased computational cost and storage requirements, and the need for large amount of ground-truth (artefact-free) data for training. To address these issues, we introduce StoDIP, a new algorithm that extends the ground-truth-free Deep Image Prior (DIP) reconstruction to 3D MRF imaging. StoDIP employs memory-efficient stochastic updates across the multicoil MRF data, a carefully selected neural network architecture, as well as faster nonuniform FFT (NUFFT) transformations. This enables a faster convergence compared against a conventional DIP implementation without these features. Tested on a dataset of whole-brain scans from healthy volunteers, StoDIP demonstrated superior performance over the ground-truth-free reconstruction baselines, both quantitatively and qualitatively.

reconstruction, resonance, stodip, (16 more...)

arXiv.org Artificial Intelligence

2408.02367

Country:

Europe > Switzerland > Zürich > Zürich (0.14)
Europe > Italy (0.04)
Europe > Germany > Bavaria > Upper Bavaria > Munich (0.04)
(4 more...)

Genre: Research Report (0.40)

Industry: Health & Medicine > Diagnostic Medicine > Imaging (1.00)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.30)

Add feedback

Scale Up Nonlinear Component Analysis with Doubly Stochastic Gradients, Le Song

Neural Information Processing SystemsMar-13-2024, 00:31:20 GMT

Nonlinear component analysis such as kernel Principle Component Analysis (KPCA) and kernel Canonical Correlation Analysis (KCCA) are widely used in machine learning, statistics and data analysis, but they cannot scale up to big datasets. Recent attempts have employed random feature approximations to convert the problem to the primal form for linear computational complexity. However, to obtain high quality solutions, the number of random features should be the same order of magnitude as the number of data points, making such approach not directly applicable to the regime with millions of data points. We propose a simple, computationally efficient, and memory friendly algorithm based on the "doubly stochastic gradients" to scale up a range of kernel nonlinear component analysis, such as kernel PCA, CCA and SVD. Despite the non-convex nature of these problems, our method enjoys theoretical guarantees that it converges at the rate Õ(1/t) to the global optimum, even for the top k eigen subspace. Unlike many alternatives, our algorithm does not require explicit orthogonalization, which is infeasible on big datasets. We demonstrate the effectiveness and scalability of our algorithm on large scale synthetic and real world datasets.

algorithm, dataset, random feature, (14 more...)

Neural Information Processing Systems

Country:

North America > United States > California > San Francisco County > San Francisco (0.14)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
North America > United States > New York (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.86)

Add feedback

Kronecker Determinantal Point Processes

Neural Information Processing SystemsMar-12-2024, 16:58:07 GMT

Determinantal Point Processes (DPPs) are probabilistic models over all subsets a ground set of N items. They have recently gained prominence in several applications that rely on "diverse" subsets.

algorithm, icard, kernel, (14 more...)

Neural Information Processing Systems

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)
North America > United States > Pennsylvania (0.04)
Europe > Spain > Catalonia > Barcelona Province > Barcelona (0.04)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.47)

Add feedback

Does Spending More Always Ensure Higher Cooperation? An Analysis of Institutional Incentives on Heterogeneous Networks

Cimpeanu, Theodor, Santos, Francisco C, Han, The Anh

arXiv.org Artificial IntelligenceJan-16-2023

Humans have developed considerable machinery used at scale to create policies and to distribute incentives, yet we are forever seeking ways in which to improve upon these, our institutions. Especially when funding is limited, it is imperative to optimise spending without sacrificing positive outcomes, a challenge which has often been approached within several areas of social, life and engineering sciences. These studies often neglect the availability of information, cost restraints, or the underlying complex network structures, which define real-world populations. Here, we have extended these models, including the aforementioned concerns, but also tested the robustness of their findings to stochastic social learning paradigms. Akin to real-world decisions on how best to distribute endowments, we study several incentive schemes, which consider information about the overall population, local neighbourhoods, or the level of influence which a cooperative node has in the network, selectively rewarding cooperative behaviour if certain criteria are met. Following a transition towards a more realistic network setting and stochastic behavioural update rule, we found that carelessly promoting cooperators can often lead to their downfall in socially diverse settings. These emergent cyclic patterns not only damage cooperation, but also decimate the budgets of external investors. Our findings highlight the complexity of designing effective and cogent investment policies in socially diverse populations.

artificial intelligence, interference, machine learning, (20 more...)

arXiv.org Artificial Intelligence

2301.0662

Country: