AITopics | skh

Collaborating Authors

skh

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Tighter Regret Bounds for Contextual Action-Set Reinforcement Learning

Chen, Zijun, Zhang, Zihan

arXiv.org Machine LearningMay-18-2026

We study episodic reinforcement learning with fixed reward and transition functions, but with episode-dependent admissible action sets that are observed at the start of each episode. Performance is measured by cumulative regret against the episode-wise optimal value, $\sum_{k=1}^K [V^{*,M^k} - V^{π^k,M^k}]$, where $M^k$ represents the action context in the $k$-th episode. We show that the MVP algorithm naturally extends to this framework and enjoys strong theoretical guarantees. In particular, we establish a minimax regret bound of $\widetilde{O}(\sqrt{SAH^3K\log L})$ for adversarial contexts, where $L$ denotes the number of possible contexts. This result implies a regret bound of $\widetilde{O}(\sqrt{SAH^3K})$ for stochastic contexts. We further translate the stochastic regret guarantee into a sample complexity bound of $\widetilde{O}(SAH^3/ε^2)$ for a fixed context distribution. In addition, we derive a gap-dependent regret bound of \[ \widetilde O\left( \inf_{p\in [0,1)} \left( \frac{1}{Δ_{\min}^{p}} + pKΔ_{\min}^{p} \right)\log K \cdot \mathrm{poly}(S,A,H) \right), \] where $Δ_{\min}^{p}$ is the global $p$-trimmed positive-gap floor over suboptimal $(h,s,a)$ triples. This bound can substantially improve upon the minimax rate when the relevant suboptimality gaps are large.

artificial intelligence, machine learning, reinforcement learning, (14 more...)

arXiv.org Machine Learning

2605.15692

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.86)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.67)
Information Technology > Artificial Intelligence > Representation & Reasoning > Search (0.54)

Add feedback

Unified Framework of Distributional Regret in Multi-Armed Bandits and Reinforcement Learning

Lee, Harin, Oh, Min-hwan

arXiv.org Machine LearningMay-8-2026

We study the distribution of regret in stochastic multi-armed bandits and episodic reinforcement learning through a unified framework. We formalize a distributional regret bound as a probabilistic guarantee that holds uniformly over all confidence levels $δ\in (0,1]$, thereby characterizing the regret distribution across the full range of $δ$. We present a simple UCBVI-style algorithm with exploration bonus $\min\{c_{1,k}/N, c_{2,k}/\sqrt{N}\}$, where $N$ denotes the visit count and $(c_{1,k},c_{2,k})$ are user-specified parameters. For arbitrary parameter sequences, we derive general gap-independent and gap-dependent distributional regret bounds, yielding a principled characterization of how the parameters control the trade-off between expected performance, tail risk, and instance-dependent behavior. In particular, our bounds achieve optimal trade-offs between expected and distributional regret in both minimax and instance-dependent regimes. As a special case, for multi-armed bandits with $A$ arms and horizon $T$, we obtain a distributional regret bound of order $\mathcal{O}(\sqrt{AT}\log(1/δ))$, confirming the conjecture of Lattimore & Szepesvári (2020, Section 17.1) for the first time.

data mining, machine learning, reinforcement learning, (22 more...)

arXiv.org Machine Learning

2605.05102

Genre: Research Report (0.81)

Technology:

Information Technology > Data Science > Data Mining > Big Data (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.84)

Add feedback

ComputationallyEfficientHorizon-Free ReinforcementLearningforLinearMixtureMDPs

Neural Information Processing SystemsFeb-12-2026, 16:12:01 GMT

How to design efficient algorithms is a central problem for reinforcement learning (RL).

artificial intelligence, machine learning, reinforcement learning, (18 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.67)

Add feedback

Near-OptimalRegretforAdversarialMDPwith DelayedBanditFeedback

Neural Information Processing SystemsFeb-12-2026, 05:29:44 GMT

The standard assumption in reinforcement learning (RL) is that agents observe feedback for their actions immediately. However, in practice feedback is often observedindelay.

machine learning, qkh, reinforcement learning, (18 more...)

Neural Information Processing Systems

Country:

Asia > Middle East > Jordan (0.04)
North America > United States > Nevada (0.04)
North America > Canada > Quebec > Montreal (0.04)
(2 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.34)

Add feedback

Worst-Case Regret Bounds for Exploration via Randomized Value Functions

Daniel Russo

Neural Information Processing SystemsFeb-12-2026, 01:03:50 GMT

This paper studies a recent proposal to use randomized value functions to drive exploration in reinforcement learning.

artificial intelligence, machine learning, reinforcement learning, (16 more...)

Neural Information Processing Systems

Country:

North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
Europe > United Kingdom > England (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.35)

Add feedback

ProvablyEfficientCausalReinforcementLearning withConfoundedObservationalData

Neural Information Processing SystemsFeb-10-2026, 17:59:49 GMT

Empowered by neural networks, deep reinforcement learning (DRL) achieves tremendous empirical success. However, DRL requires a large dataset by interacting with the environment, which is unrealistic in critical scenarios such as autonomous driving and personalized medicine. In this paper, we study how to incorporate the dataset collected in the offline setting to improve the sample efficiency in the online setting. To incorporate the observational data, we face two challenges.

artificial intelligence, machine learning, reinforcement learning, (19 more...)

Neural Information Processing Systems

Country: Asia > Middle East > Jordan (0.04)

Industry: Health & Medicine (0.67)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)

Add feedback

OnReward-FreeReinforcementLearningwith LinearFunctionApproximation

Neural Information Processing SystemsFeb-10-2026, 11:12:45 GMT

During the exploration phase, an agent collects samples without using a pre-specified reward function. After the exploration phase, a reward function is given, and the agent uses samples collected during the exploration phase to computeanear-optimalpolicy.

artificial intelligence, exploration phase, machine learning, (16 more...)

Neural Information Processing Systems

Country:

North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
Europe > United Kingdom (0.04)
North America > United States > Massachusetts > Middlesex County > Belmont (0.04)
(2 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

p |S|3|A|K) 0 OptPess-PrimalDual O(H

Neural Information Processing SystemsFeb-9-2026, 21:02:01 GMT

We address the issue of safety in reinforcement learning. We pose the problem in an episodic framework of a constrained Markov decision process.

artificial intelligence, machine learning, reinforcement learning, (17 more...)

Neural Information Processing Systems

Country:

North America > United States > Texas > Brazos County > College Station (0.04)
Asia > Middle East > Jordan (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.48)

Add feedback

Uniform-PACBoundsforReinforcementLearning withLinearFunctionApproximation

Neural Information Processing SystemsFeb-9-2026, 10:14:58 GMT

Designing efficient reinforcement learning (RL) algorithms for environments with large state and action spaces is one of the main tasks in the RL community.

akih, machine learning, skih, (19 more...)

Neural Information Processing Systems

Country:

North America > United States (0.04)
Europe > United Kingdom > England (0.04)
Asia > Middle East > Jordan (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.68)

Add feedback

ProvablyEfficientReinforcementLearningwith LinearFunctionApproximationunderAdaptivity Constraints

Neural Information Processing SystemsFeb-9-2026, 07:55:23 GMT

Real-world reinforcement learning (RL) applications often come with possibly infinite state and action space, and in such a situation classical RL algorithms developed in the tabular setting are not applicable anymore. A popular approach to overcoming this issue is by applying function approximation techniques to the underlying structures of the Markovdecision processes (MDPs).

artificial intelligence, machine learning, reinforcement learning, (16 more...)

Neural Information Processing Systems

Genre: Research Report > New Finding (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.70)

Add feedback