AITopics | ln 3

Optimization under heavy-tailed noise has become popular recently, since it better fits many modern machine learning tasks, as captured by empirical observations. Concretely, instead of a finite second moment on gradient noise, a bounded ${\frak p}$-th moment where ${\frak p}\in(1,2]$ has been recognized to be more realistic (say being upper bounded by $σ_{\frak l}^{\frak p}$ for some $σ_{\frak l}\ge0$). A simple yet effective operation, gradient clipping, is known to handle this new challenge successfully. Specifically, Clipped Stochastic Gradient Descent (Clipped SGD) guarantees a high-probability rate ${\cal O}(σ_{\frak l}\ln(1/δ)T^{1/{\frak p}-1})$ (resp. ${\cal O}(σ_{\frak l}^2\ln^2(1/δ)T^{2/{\frak p}-2})$) for nonsmooth convex (resp. strongly convex) problems, where $δ\in(0,1]$ is the failure probability and $T\in\mathbb{N}$ is the time horizon. In this work, we provide a refined analysis for Clipped SGD and offer two faster rates, ${\cal O}(σ_{\frak l}d_{\rm eff}^{-1/2{\frak p}}\ln^{1-1/{\frak p}}(1/δ)T^{1/{\frak p}-1})$ and ${\cal O}(σ_{\frak l}^2d_{\rm eff}^{-1/{\frak p}}\ln^{2-2/{\frak p}}(1/δ)T^{2/{\frak p}-2})$, than the aforementioned best results, where $d_{\rm eff}\ge1$ is a quantity we call the $\textit{generalized effective dimension}$. Our analysis improves upon the existing approach on two sides: better utilization of Freedman's inequality and finer bounds for clipping error under heavy-tailed noise. In addition, we extend the refined analysis to convergence in expectation and obtain new rates that break the known lower bounds. Lastly, to complement the study, we establish new lower bounds for both high-probability and in-expectation convergence. Notably, the in-expectation lower bounds match our new upper bounds, indicating the optimality of our refined analysis for convergence in expectation.

artificial intelligence, clipped sgd, machine learning, (14 more...)

arXiv.org Machine Learning

2512.23178

Country: North America > United States (0.27)

Genre: Research Report (0.63)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.92)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.54)

Add feedback

Uncoupled and Convergent Learning in Two-Player Zero-Sum Markov Games with Bandit Feedback

Neural Information Processing SystemsOct-8-2025, 21:53:22 GMT

Our paper takes the first step to remove such assumptions.

artificial intelligence, machine learning, reinforcement learning, (15 more...)

Neural Information Processing Systems

Country:

North America > United States > California (0.14)
North America > United States > Virginia (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Workflow (0.48)

Industry: Leisure & Entertainment (0.45)

Technology:

Information Technology > Game Theory (0.93)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.68)

Add feedback

c00193e70e8e27e70601b26161b4ae86-Supplemental.pdf

Neural Information Processing SystemsAug-17-2025, 04:32:58 GMT

artificial intelligence, data mining, machine learning, (18 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning (0.49)
Information Technology > Data Science > Data Mining (0.47)

Add feedback

a18aa23ee676d7f5ffb34cf16df3e08c-Supplemental.pdf

Neural Information Processing SystemsAug-15-2025, 12:39:58 GMT

algorithm, relation hold, value update, (16 more...)

Neural Information Processing Systems

Country:

North America > Canada (0.04)
Asia > Middle East > Jordan (0.04)
Asia > Middle East > Israel (0.04)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Search (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Planning & Scheduling (0.67)

Add feedback

Gradient Compressed Sensing: A Query-Efficient Gradient Estimator for High-Dimensional Zeroth-Order Optimization

Qiu, Ruizhong, Tong, Hanghang

arXiv.org Artificial IntelligenceMay-26-2024

We study nonconvex zeroth-order optimization (ZOO) in a high-dimensional space $\mathbb R^d$ for functions with approximately $s$-sparse gradients. To reduce the dependence on the dimensionality $d$ in the query complexity, high-dimensional ZOO methods seek to leverage gradient sparsity to design gradient estimators. The previous best method needs $O\big(s\log\frac ds\big)$ queries per step to achieve $O\big(\frac1T\big)$ rate of convergence w.r.t. the number T of steps. In this paper, we propose *Gradient Compressed Sensing* (GraCe), a query-efficient and accurate estimator for sparse gradients that uses only $O\big(s\log\log\frac ds\big)$ queries per step and still achieves $O\big(\frac1T\big)$ rate of convergence. To our best knowledge, we are the first to achieve a *double-logarithmic* dependence on $d$ in the query complexity under weaker assumptions. Our proposed GraCe generalizes the Indyk--Price--Woodruff (IPW) algorithm in compressed sensing from linear measurements to nonlinear functions. Furthermore, since the IPW algorithm is purely theoretical due to its impractically large constant, we improve the IPW algorithm via our *dependent random partition* technique together with our corresponding novel analysis and successfully reduce the constant by a factor of nearly 4300. Our GraCe is not only theoretically query-efficient but also achieves strong empirical performance. We benchmark our GraCe against 12 existing ZOO methods with 10000-dimensional functions and demonstrate that GraCe significantly outperforms existing methods.

gradient compressed sensing, ln 3, query-efficient gradient estimator, (7 more...)

arXiv.org Artificial Intelligence

2405.16805

Country:

Europe > Austria > Vienna (0.14)
North America > United States > Massachusetts (0.04)
North America > United States > Illinois > Champaign County > Urbana (0.04)
(2 more...)

Genre: Research Report (1.00)

Industry: Government (0.67)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.46)

Add feedback

Learning-based Scheduling for Information Accuracy and Freshness in Wireless Networks

Gudwani, Hitesh

arXiv.org Artificial IntelligenceOct-24-2023

We consider a system of multiple sources, a single communication channel, and a single monitoring station. Each source measures a time-varying quantity with varying levels of accuracy and one of them sends its update to the monitoring station via the channel. The probability of success of each attempted communication is a function of the source scheduled for transmitting its update. Both the probability of correct measurement and the probability of successful transmission of all the sources are unknown to the scheduler. The metric of interest is the reward received by the system which depends on the accuracy of the last update received by the destination and the Age-of-Information (AoI) of the system. We model our scheduling problem as a variant of the multi-arm bandit problem with sources as different arms. We compare the performance of all $4$ standard bandit policies, namely, ETC, $\epsilon$-greedy, UCB, and TS suitably adjusted to our system model via simulations. In addition, we provide analytical guarantees of $2$ of these policies, ETC, and $\epsilon$-greedy. Finally, we characterize the lower bound on the cumulative regret achievable by any policy.

ln 3, probability, time slot, (15 more...)

arXiv.org Artificial Intelligence

2310.15705

Country: Asia > Middle East > Republic of Türkiye > Karaman Province > Karaman (0.04)

Genre: Research Report (0.63)

Industry: Health & Medicine (0.36)

Technology:

Information Technology > Communications > Networks (0.64)
Information Technology > Artificial Intelligence > Machine Learning (0.46)
Information Technology > Artificial Intelligence > Representation & Reasoning > Planning & Scheduling (0.34)
Information Technology > Data Science > Data Mining > Big Data (0.34)

Add feedback

Minimax Weight Learning for Absorbing MDPs

Li, Fengyin, Li, Yuqiang, Wu, Xianyi

arXiv.org Artificial IntelligenceSep-5-2023

Reinforcement learning policy evaluation problems are often modeled as finite or discounted/averaged infinite-horizon MDPs. In this paper, we study undiscounted off-policy policy evaluation for absorbing MDPs. Given the dataset consisting of the i.i.d episodes with a given truncation level, we propose a so-called MWLA algorithm to directly estimate the expected return via the importance ratio of the state-action occupancy measure. The Mean Square Error (MSE) bound for the MWLA method is investigated and the dependence of statistical errors on the data size and the truncation level are analyzed. With an episodic taxi environment, computational experiments illustrate the performance of the MWLA algorithm.

algorithm, mdp, theorem 4, (16 more...)

arXiv.org Artificial Intelligence

2301.03183

Country:

North America > United States > New York (0.04)
Asia > Japan (0.04)
Asia > China > Shanghai > Shanghai (0.04)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)
Information Technology > Artificial Intelligence > Representation & Reasoning > Search (0.41)

Add feedback

Multi-Step Greedy and Approximate Real Time Dynamic Programming

Efroni, Yonathan, Ghavamzadeh, Mohammad, Mannor, Shie

arXiv.org Machine LearningSep-9-2019

Real Time Dynamic Programming (RTDP) is a well-known Dynamic Programming (DP) based algorithm that combines planning and learning to find an optimal policy for an MDP. It is a planning algorithm because it uses the MDP's model (reward and transition functions) to calculate a 1-step greedy policy w.r.t.~an optimistic value function, by which it acts. It is a learning algorithm because it updates its value function only at the states it visits while interacting with the environment. As a result, unlike DP, RTDP does not require uniform access to the state space in each iteration, which makes it particularly appealing when the state space is large and simultaneously updating all the states is not computationally feasible. In this paper, we study a generalized multi-step greedy version of RTDP, which we call $h$-RTDP, in its exact form, as well as in three approximate settings: approximate model, approximate value updates, and approximate state abstraction. We analyze the sample, computation, and space complexities of $h$-RTDP and establish that increasing $h$ improves sample and space complexity, with the cost of additional offline computational operations. For the approximate cases, we prove that the asymptotic performance of $h$-RTDP is the same as that of a corresponding approximate DP -- the best one can hope for without further assumptions on the approximation errors. $h$-RTDP is the first algorithm with a provably improved sample complexity when increasing the lookahead horizon.

artificial intelligence, nh 1, planning & scheduling, (20 more...)

arXiv.org Machine Learning

1909.04236

Genre: Research Report (0.49)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.90)
Information Technology > Artificial Intelligence > Representation & Reasoning > Planning & Scheduling (0.87)
Information Technology > Artificial Intelligence > Representation & Reasoning > Search (0.67)

Add feedback

Filters

Collaborating Authors

ln 3

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

Uncoupled and Convergent Learning in Two-Player Zero-Sum Markov Games with Bandit Feedback

a18aa23ee676d7f5ffb34cf16df3e08c-Supplemental.pdf

Clipped Gradient Methods for Nonsmooth Convex Optimization under Heavy-Tailed Noise: A Refined Analysis

Uncoupled and Convergent Learning in Two-Player Zero-Sum Markov Games with Bandit Feedback

c00193e70e8e27e70601b26161b4ae86-Supplemental.pdf

a18aa23ee676d7f5ffb34cf16df3e08c-Supplemental.pdf

Gradient Compressed Sensing: A Query-Efficient Gradient Estimator for High-Dimensional Zeroth-Order Optimization

Learning-based Scheduling for Information Accuracy and Freshness in Wireless Networks

Minimax Weight Learning for Absorbing MDPs

Multi-Step Greedy and Approximate Real Time Dynamic Programming