AITopics | without-replacement sampling

Collaborating Authors

without-replacement sampling

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Without-Replacement Sampling for Stochastic Gradient Methods

Neural Information Processing SystemsNov-21-2025, 15:22:43 GMT

Stochastic gradient methods for machine learning and optimization problems are usually analyzed assuming data points are sampled replacement. In contrast, sampling replacement is far less understood, yet in practice it is very common, often easier to implement, and usually performs better. In this paper, we provide competitive convergence guarantees for without-replacement sampling under several scenarios, focusing on the natural regime of few passes over the data. Moreover, we describe a useful application of these results in the context of distributed optimization with randomly-partitioned data, yielding a nearly-optimal algorithm for regularized least squares (in terms of both communication complexity and runtime complexity) under broad parameter regimes. Our proof techniques combine ideas from stochastic optimization, adversarial online learning and transductive learning theory, and can potentially be applied to other stochastic optimization and learning problems.

name change, stochastic gradient method, without-replacement sampling, (3 more...)

Neural Information Processing Systems

Industry: Education (0.62)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.68)

Add feedback

Provably Faster Algorithms for Bilevel Optimization via Without-Replacement Sampling

Neural Information Processing SystemsMay-27-2025, 06:58:25 GMT

Bilevel Optimization has experienced significant advancements recently with the introduction of new efficient algorithms. Mirroring the success in single-level optimization, stochastic gradient-based algorithms are widely used in bilevel optimization. However, a common limitation in these algorithms is the presumption of independent sampling, which can lead to increased computational costs due to the unique hyper-gradient structure in bilevel problems. To address this challenge, we study the example-selection strategy for bilevel optimization in this work. More specifically, we introduce a without-replacement sampling based algorithm which achieves a faster convergence rate compared to its counterparts that rely on independent sampling.

algorithm, bilevel optimization, without-replacement sampling, (1 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (0.46)
Information Technology > Artificial Intelligence > Machine Learning (0.46)

Add feedback

Reviews: Without-Replacement Sampling for Stochastic Gradient Methods

Neural Information Processing SystemsJan-20-2025, 19:36:11 GMT

The paper studies the problem of minimizing the average of a finite sum of convex functions over a convex domain using stochastic algorithms that, opposed to most popular methods, apply WITHOUT-replacement sampling to the data. More specifically, the paper considers methods that first randomly permute the functions, and then process the functions via incremental updates one at a time, making at most a single pass over the data (hence the data is only shuffled once). There are three main motivations for considering stochastic methods with without-replacement sampling: 1. it is observed many times empirically (and to the best of my knowledge this is what ML practitioners really do) that applying random shuffles to the data and then apply incremental updates works better than with-replacement SGD. 2. After the data is randomly permuted, these algorithms require only sequential access to the memory, which is much more efficient than standard with-replacement SGD methods that require random access to perform the updates. Since the main setting under consideration here is when making at most a single pass, it sounds plausible to assume that the data is already stored in some random permutation, and hence the algorithm is fully sequential, and there is no need to even artificially permute the data. In a distributed setting in which data is partitioned to several machine (not assuming the data is sampled i.i.d from a distribution), it is more natural and efficient to analyze without-replacement algorithms.

algorithm, stochastic gradient method, without-replacement sampling, (9 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Mathematical & Statistical Methods (0.40)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.40)

Add feedback

Without-Replacement Sampling for Stochastic Gradient Methods

Shamir, Ohad

Neural Information Processing SystemsFeb-14-2020, 04:56:30 GMT

Stochastic gradient methods for machine learning and optimization problems are usually analyzed assuming data points are sampled *with* replacement. In contrast, sampling *without* replacement is far less understood, yet in practice it is very common, often easier to implement, and usually performs better. In this paper, we provide competitive convergence guarantees for without-replacement sampling under several scenarios, focusing on the natural regime of few passes over the data. Moreover, we describe a useful application of these results in the context of distributed optimization with randomly-partitioned data, yielding a nearly-optimal algorithm for regularized least squares (in terms of both communication complexity and runtime complexity) under broad parameter regimes. Our proof techniques combine ideas from stochastic optimization, adversarial online learning and transductive learning theory, and can potentially be applied to other stochastic optimization and learning problems. Papers published at the Neural Information Processing Systems Conference.

optimization, stochastic gradient method, without-replacement sampling, (1 more...)

Neural Information Processing Systems

Industry: Education > Focused Education > Special Education (0.30)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.66)

Add feedback