AITopics | individual function

Collaborating Authors

individual function

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Limitations on Variance-Reduction and Acceleration Schemes for Finite Sums Optimization

Yossi Arjevani

Neural Information Processing SystemsNov-21-2025, 12:03:40 GMT

We study the conditions under which one is able to efficiently apply variance-reduction and acceleration schemes on finite sum optimization problems.

algorithm, artificial intelligence, machine learning, (12 more...)

Neural Information Processing Systems

Country:

North America > United States > California > Los Angeles County > Long Beach (0.04)
Asia > Middle East > Israel (0.04)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.95)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.93)

Add feedback

R4), relevant to the conference (R2, R4), and is generally on an interesting topic (R1, R2)

Neural Information Processing SystemsAug-16-2025, 10:40:07 GMT

We thank the reviewers for their work and for the positive evaluation of our paper. R4), relevant to the conference (R2, R4), and is generally on an interesting topic (R1, R2). Thus, we also provided guarantees for SO without strong convexity. Adding a small amount of regularization is also a common practice for numerical stability. Reviewer 2. We appreciate your support of our paper.

assumption, convergence, interesting topic, (16 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.49)

Add feedback

Review for NeurIPS paper: Random Reshuffling: Simple Analysis with Vast Improvements

Neural Information Processing SystemsFeb-5-2025, 23:45:43 GMT

The abstract claims to remove the small step size requirements of prior work. However, to attain a good convergence rate (Corollary 1) the main theorems (Theorems 1 and 2) need a small step size, similar to previous works. In fact Safran and Shamir (2020) show that convergence is only possible for step size O(1/n) . Please modify the claims accordingly. However, the dependence on \mu has worsened.

dependence, theorem 1, theorem 2, (14 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.52)

Add feedback

Limitations on Variance-Reduction and Acceleration Schemes for Finite Sums Optimization

Yossi Arjevani

Neural Information Processing SystemsOct-4-2024, 00:32:15 GMT

We study the conditions under which one is able to efficiently apply variancereduction and acceleration schemes on finite sum optimization problems. First, we show that, perhaps surprisingly, the finite sum structure by itself, is not sufficient for obtaining a complexity bound of Õ((n + L/µ) ln(1/ɛ)) for L-smooth and µ-strongly convex individual functions - one must also know which individual function is being referred to by the oracle at each iteration. Next, we show that for a broad class of first-order and coordinate-descent finite sum algorithms (including, e.g., SDCA, SVRG, SAG), it is not possible to get an'accelerated' complexity bound of Õ((n+ nL/µ) ln(1/ɛ)), unless the strong convexity parameter is given explicitly. Lastly, we show that when this class of algorithms is used for minimizing L-smooth and convex finite sums, the iteration complexity is bounded from below by Ω(n + L/ɛ), assuming that (on average) the same update rule is used in any iteration, and Ω(n + nL/ɛ) otherwise.

algorithm, iteration complexity, optimization algorithm, (10 more...)

Neural Information Processing Systems

Country:

North America > United States > California > Los Angeles County > Long Beach (0.04)
Asia > Middle East > Israel (0.04)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.95)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.94)

Add feedback

Without-Replacement Sampling for Stochastic Gradient Methods

Neural Information Processing SystemsMar-12-2024, 17:28:54 GMT

Stochastic gradient methods for machine learning and optimization problems are usually analyzed assuming data points are sampled with replacement. In contrast, sampling without replacement is far less understood, yet in practice it is very common, often easier to implement, and usually performs better. In this paper, we provide competitive convergence guarantees for without-replacement sampling under several scenarios, focusing on the natural regime of few passes over the data. Moreover, we describe a useful application of these results in the context of distributed optimization with randomly-partitioned data, yielding a nearly-optimal algorithm for regularized least squares (in terms of both communication complexity and runtime complexity) under broad parameter regimes. Our proof techniques combine ideas from stochastic optimization, adversarial online learning and transductive learning theory, and can potentially be applied to other stochastic optimization and learning problems.

algorithm, optimization, permutation, (15 more...)

Neural Information Processing Systems

Country:

North America > United States > New York (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Europe > Spain > Catalonia > Barcelona Province > Barcelona (0.04)
Asia > Middle East > Israel (0.04)

Industry: Education (0.54)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.74)

Add feedback

An Optimal Stochastic Algorithm for Decentralized Nonconvex Finite-sum Optimization

Luo, Luo, Ye, Haishan

arXiv.org Artificial IntelligenceNov-18-2022

This paper studies the decentralized nonconvex optimization problem $\min_{x\in{\mathbb R}^d} f(x)\triangleq \frac{1}{m}\sum_{i=1}^m f_i(x)$, where $f_i(x)\triangleq \frac{1}{n}\sum_{j=1}^n f_{i,j}(x)$ is the local function on the $i$-th agent of the network. We propose a novel stochastic algorithm called DEcentralized probAbilistic Recursive gradiEnt deScenT (\DEAREST), which integrates the techniques of variance reduction, gradient tracking and multi-consensus. We construct a Lyapunov function that simultaneously characterizes the function value, the gradient estimation error and the consensus error for the convergence analysis. Based on this measure, we provide a concise proof to show DEAREST requires at most ${\mathcal O}(mn+\sqrt{mn}L\varepsilon^{-2})$ incremental first-order oracle (IFO) calls and ${\mathcal O}({L\varepsilon^{-2}}/{\sqrt{1-\lambda_2(W)}}\,)$ communication rounds to find an $\varepsilon$-stationary point in expectation, where $L$ is the smoothness parameter and $\lambda_2(W)$ is the second-largest eigenvalue of the gossip matrix $W$. We can verify both of the IFO complexity and communication complexity match the lower bounds. To the best of our knowledge, DEAREST is the first optimal algorithm for decentralized nonconvex finite-sum optimization.

artificial intelligence, machine learning, optimization, (17 more...)

arXiv.org Artificial Intelligence

2210.13931

Country: Asia > Middle East > Jordan (0.04)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.36)

Add feedback

On the Complexity of Minimizing Convex Finite Sums Without Using the Indices of the Individual Functions

Arjevani, Yossi, Daniely, Amit, Jegelka, Stefanie, Lin, Hongzhou

arXiv.org Machine LearningFeb-8-2020

Recent advances in randomized incremental methods for minimizing $L$-smooth $\mu$-strongly convex finite sums have culminated in tight complexity of $\tilde{O}((n+\sqrt{n L/\mu})\log(1/\epsilon))$ and $O(n+\sqrt{nL/\epsilon})$, where $\mu>0$ and $\mu=0$, respectively, and $n$ denotes the number of individual functions. Unlike incremental methods, stochastic methods for finite sums do not rely on an explicit knowledge of which individual function is being addressed at each iteration, and as such, must perform at least $\Omega(n^2)$ iterations to obtain $O(1/n^2)$-optimal solutions. In this work, we exploit the finite noise structure of finite sums to derive a matching $O(n^2)$-upper bound under the global oracle model, showing that this lower bound is indeed tight. Following a similar approach, we propose a novel adaptation of SVRG which is both \emph{compatible with stochastic oracles}, and achieves complexity bounds of $\tilde{O}((n^2+n\sqrt{L/\mu})\log(1/\epsilon))$ and $O(n\sqrt{L/\epsilon})$, for $\mu>0$ and $\mu=0$, respectively. Our bounds hold w.h.p. and match in part existing lower bounds of $\tilde{\Omega}(n^2+\sqrt{nL/\mu}\log(1/\epsilon))$ and $\tilde{\Omega}(n^2+\sqrt{nL/\epsilon})$, for $\mu>0$ and $\mu=0$, respectively.

complexity, individual function, oracle, (15 more...)

arXiv.org Machine Learning

2002.03273

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
North America > United States > New York > Richmond County > New York City (0.04)
North America > United States > New York > Queens County > New York City (0.04)
(4 more...)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

Add feedback

How Good is SGD with Random Shuffling?

Safran, Itay, Shamir, Ohad

arXiv.org Machine LearningJul-31-2019

We study the performance of stochastic gradient descent (SGD) on smooth and strongly-convex finite-sum optimization problems. In contrast to the majority of existing theoretical works, which assume that individual functions are sampled with replacement, we focus here on popular but poorly-understood heuristics, which involve going over random permutations of the individual functions. This setting has been investigated in several recent works, but the optimal error rates remains unclear. In this paper, we provide lower bounds on the expected optimization error with these heuristics (using SGD with any constant step size), which elucidate their advantages and disadvantages. In particular, we prove that after $k$ passes over $n$ individual functions, if the functions are re-shuffled after every pass, the best possible optimization error for SGD is at least $\Omega\left(1/(nk)^2+1/nk^3\right)$, which partially corresponds to recently derived upper bounds, and we conjecture to be tight. Moreover, if the functions are only shuffled once, then the lower bound increases to $\Omega(1/nk^2)$. Since there are strictly smaller upper bounds for random reshuffling, this proves an inherent performance gap between SGD with single shuffling and repeated shuffling. As a more minor contribution, we also provide a non-asymptotic $\Omega(1/k^2)$ lower bound (independent of $n$) for cyclic gradient descent, where no random shuffling takes place.

artificial intelligence, machine learning, sgd, (15 more...)

arXiv.org Machine Learning

1908.00045

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.75)

Add feedback

Limitations on Variance-Reduction and Acceleration Schemes for Finite Sums Optimization

Arjevani, Yossi

Neural Information Processing SystemsDec-31-2017

We study the conditions under which one is able to efficiently apply variance-reduction and acceleration schemes on finite sums problems. First, we show that perhaps surprisingly, the finite sum structure, by itself, is not sufficient for obtaining a complexity bound of $\tilde{\cO}((n+L/\mu)\ln(1/\epsilon))$ for $L$-smooth and $\mu$-strongly convex finite sums - one must also know exactly which individual function is being referred to by the oracle at each iteration. Next, we show that for a broad class of first-order and coordinate-descent finite sums algorithms (including, e.g., SDCA, SVRG, SAG), it is not possible to get an `accelerated' complexity bound of $\tilde{\cO}((n+\sqrt{n L/\mu})\ln(1/\epsilon))$, unless the strong convexity parameter is given explicitly. Lastly, we show that when this class of algorithms is used for minimizing $L$-smooth and non-strongly convex finite sums, the optimal complexity bound is $\tilde{\cO}(n+L/\epsilon)$, assuming that (on average) the same update rule is used for any iteration, and $\tilde{\cO}(n+\sqrt{nL/\epsilon})$, otherwise.

algorithm, artificial intelligence, machine learning, (12 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.95)

Add feedback

Limitations on Variance-Reduction and Acceleration Schemes for Finite Sum Optimization

Arjevani, Yossi

arXiv.org Machine LearningDec-6-2017

We study the conditions under which one is able to efficiently apply variance-reduction and acceleration schemes on finite sum optimization problems. First, we show that, perhaps surprisingly, the finite sum structure by itself, is not sufficient for obtaining a complexity bound of $\tilde{\cO}((n+L/\mu)\ln(1/\epsilon))$ for $L$-smooth and $\mu$-strongly convex individual functions - one must also know which individual function is being referred to by the oracle at each iteration. Next, we show that for a broad class of first-order and coordinate-descent finite sum algorithms (including, e.g., SDCA, SVRG, SAG), it is not possible to get an `accelerated' complexity bound of $\tilde{\cO}((n+\sqrt{n L/\mu})\ln(1/\epsilon))$, unless the strong convexity parameter is given explicitly. Lastly, we show that when this class of algorithms is used for minimizing $L$-smooth and convex finite sums, the optimal complexity bound is $\tilde{\cO}(n+L/\epsilon)$, assuming that (on average) the same update rule is used in every iteration, and $\tilde{\cO}(n+\sqrt{nL/\epsilon})$, otherwise.

algorithm, artificial intelligence, machine learning, (14 more...)

arXiv.org Machine Learning

1706.01686

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.94)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.71)

Add feedback