AITopics | regret bound

Collaborating Authors

regret bound

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Supplementary material for " Regret Bounds for Multilabel Classification in Sparse Label Regimes "

Neural Information Processing SystemsMay-1-2026, 01:52:54 GMT

This appendix contains all proofs of the results mentioned in the main body of the paper, plus further results which have been omitted there due to space limits. We recall the following lemma which upper bounds the probability measure of the ball around a point x X that contains its kth nearest neighbors. The proof immediately follows from the multiplicative Chernoff bound (see, e.g., Lemma 3.2 in [28]). When combined with Assumption 5.1 we obtain the following corollary. Corollary A.2. Suppose that the measure-smoothness assumption (Assumption 5.1) holds with parameters λ, Cλ, k k.

artificial intelligence, machine learning, probability, (16 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Improved Bayes Regret Bounds for Multi-Task Hierarchical Bayesian Bandit Algorithms Jiechao Guan 1 Hui Xiong 1,2, 1 AI Thrust, The Hong Kong University of Science and Technology (Guangzhou), China

Neural Information Processing SystemsFeb-16-2026, 08:13:58 GMT

Hierarchical Bayesian bandit refers to the multi-task bandit problem in which bandit tasks are assumed to be drawn from the same distribution.

bayes regret, data mining, machine learning, (20 more...)

Neural Information Processing Systems

Country:

Asia > China > Guangdong Province > Guangzhou (0.40)
Asia > China > Hong Kong (0.40)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report > Experimental Study (0.92)

Technology:

Information Technology > Data Science > Data Mining > Big Data (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Blocking

Neural Information Processing SystemsFeb-12-2026, 20:47:12 GMT

Weconsider makes situations product onmachines).

artificial intelligence, machine learning, neural information processing system, (11 more...)

Neural Information Processing Systems

Country:

North America > United States > Hawaii (0.04)
North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.71)

Add feedback

Online Learning for Uninformed Markov Games: Empirical Nash-Value Regret and Non-Stationarity Adaptation

Liu, Junyan, Luo, Haipeng, Zhang, Zihan, Ratliff, Lillian J.

arXiv.org Machine LearningFeb-10-2026

We study online learning in two-player uninformed Markov games, where the opponent's actions and policies are unobserved. In this setting, Tian et al. (2021) show that achieving no-external-regret is impossible without incurring an exponential dependence on the episode length $H$. They then turn to the weaker notion of Nash-value regret and propose a V-learning algorithm with regret $O(K^{2/3})$ after $K$ episodes. However, their algorithm and guarantee do not adapt to the difficulty of the problem: even in the case where the opponent follows a fixed policy and thus $O(\sqrt{K})$ external regret is well-known to be achievable, their result is still the worse rate $O(K^{2/3})$ on a weaker metric. In this work, we fully address both limitations. First, we introduce empirical Nash-value regret, a new regret notion that is strictly stronger than Nash-value regret and naturally reduces to external regret when the opponent follows a fixed policy. Moreover, under this new metric, we propose a parameter-free algorithm that achieves an $O(\min \{\sqrt{K} + (CK)^{1/3},\sqrt{LK}\})$ regret bound, where $C$ quantifies the variance of the opponent's policies and $L$ denotes the number of policy switches (both at most $O(K)$). Therefore, our results not only recover the two extremes -- $O(\sqrt{K})$ external regret when the opponent is fixed and $O(K^{2/3})$ Nash-value regret in the worst case -- but also smoothly interpolate between these extremes by automatically adapting to the opponent's non-stationarity. We achieve so by first providing a new analysis of the epoch-based V-learning algorithm by Mao et al. (2022), establishing an $O(ηC + \sqrt{K/η})$ regret bound, where $η$ is the epoch incremental factor. Next, we show how to adaptively restart this algorithm with an appropriate $η$ in response to the potential non-stationarity of the opponent, eventually achieving our final results.

artificial intelligence, machine learning, reinforcement learning, (17 more...)

arXiv.org Machine Learning

2602.07205

Country:

North America > United States > California (0.14)
Asia > Middle East > Jordan (0.04)
Asia > China > Hong Kong (0.04)

Genre: Research Report > New Finding (0.48)

Industry:

Leisure & Entertainment > Games (0.67)
Education > Educational Setting > Online (0.60)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.68)
Information Technology > Enterprise Applications > Human Resources > Learning Management (0.60)

Add feedback

0f0e13216262f4a201bec128044dd30f-AuthorFeedback.pdf

Neural Information Processing SystemsFeb-7-2026, 11:56:59 GMT

algorithm, final version, reviewer, (5 more...)

Neural Information Processing Systems

Genre: Research Report > New Finding (0.37)

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.75)

Add feedback

Regret Bounds for Learning State Representations in Reinforcement Learning

Neural Information Processing SystemsDec-25-2025, 19:02:12 GMT

We consider the problem of online reinforcement learning when several state representations (mapping histories to a discrete state space) are available to the learning agent. At least one of these representations is assumed to induce a Markov decision process (MDP), and the performance of the agent is measured in terms of cumulative regret against the optimal policy giving the highest average reward in this MDP representation. We propose an algorithm (UCB-MS) with O(sqrt(T)) regret in any communicating Markov decision process. The regret bound shows that UCB-MS automatically adapts to the Markov model. This improves over the currently known best results in the literature that gave regret bounds of order O(T^(2/3)).

learning state representation, name change, regret bound, (4 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.86)

Add feedback

Regret Bounds for Risk-Sensitive Reinforcement Learning

Neural Information Processing SystemsDec-25-2025, 15:18:22 GMT

In safety-critical applications of reinforcement learning such as healthcare and robotics, it is often desirable to optimize risk-sensitive objectives that account for tail outcomes rather than expected reward. We prove the first regret bounds for reinforcement learning under a general class of risk-sensitive objectives including the popular CVaR objective. Our theory is based on a novel characterization of the CVaR objective as well as a novel optimistic MDP construction.

name change, regret bound, risk-sensitive reinforcement learning, (2 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.60)

Add feedback

Regret Bounds for Thompson Sampling in Episodic Restless Bandit Problems

Neural Information Processing SystemsDec-25-2025, 04:41:17 GMT

Restless bandit problems are instances of non-stationary multi-armed bandits. These problems have been studied well from the optimization perspective, where the goal is to efficiently find a near-optimal policy when system parameters are known. However, very few papers adopt a learning perspective, where the parameters are unknown. In this paper, we analyze the performance of Thompson sampling in episodic restless bandits with unknown parameters. We consider a general policy map to define our competitor and prove an $\tilde{\bigO}(\sqrt{T})$ Bayesian regret bound. Our competitor is flexible enough to represent various benchmarks including the best fixed action policy, the optimal policy, the Whittle index policy, or the myopic policy.

episodic restless bandit problem, regret bound, thompson sampling, (2 more...)

Neural Information Processing Systems

Technology:

Information Technology > Data Science > Data Mining > Big Data (0.66)
Information Technology > Artificial Intelligence > Machine Learning (0.46)

Add feedback

Variational Bayesian Reinforcement Learning with Regret Bounds

Neural Information Processing SystemsDec-25-2025, 04:40:33 GMT

We consider the exploration-exploitation trade-off in reinforcement learning and show that an agent endowed with an exponential epistemic-risk-seeking utility function explores efficiently, as measured by regret. The state-action values induced by the exponential utility satisfy a Bellman recursion, so we can use dynamic programming to compute them. We call the resulting algorithm K-learning (for knowledge) and the risk-seeking utility ensures that the associated state-action values (K-values) are optimistic for the expected optimal Q-values under the posterior. The exponential utility function induces a Boltzmann exploration policy for which the'temperature' parameter is equal to the risk-seeking parameter and is carefully controlled to yield a Bayes regret bound of $\tilde O(L^{3/2} \sqrt{S A T})$, where $L$ is the time horizon, $S$ is the number of states, $A$ is the number of actions, and $T$ is the total number of elapsed timesteps. We conclude with a numerical example demonstrating that K-learning is competitive with other state-of-the-art algorithms in practice.

name change, regret bound, variational bayesian reinforcement learning, (3 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.31)

Add feedback

Failure-Aware Gaussian Process Optimization with Regret Bounds

Neural Information Processing SystemsDec-25-2025, 02:53:07 GMT

Real-world optimization problems often require black-box optimization with observation failure, where we can obtain the objective function value if we succeed, otherwise, we can only obtain a fact of failure. Moreover, this failure region can be complex by several latent constraints, whose number is also unknown. For this problem, we propose a failure-aware Gaussian process upper confidence bound (F-GP-UCB), which only requires a mild assumption for the observation failure that an optimal solution lies on an interior of a feasible region. Furthermore, we show that the number of successful observations grows linearly, by which we provide the first regret upper bounds and the convergence of F-GP-UCB. We demonstrate the effectiveness of F-GP-UCB in several benchmark functions, including the simulation function motivated by material synthesis experiments.

failure-aware gaussian process optimization, name change, regret bound, (3 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.61)

Add feedback