AITopics | diameter

Optimistic posterior sampling for reinforcement learning: worst-case regret bounds

Neural Information Processing SystemsMar-17-2026, 13:31:40 GMT

We present an algorithm based on posterior sampling (aka Thompson sampling) that achieves near-optimal worst-case regret bounds when the underlying Markov Decision Process (MDP) is communicating with a finite, though unknown, diameter. Our main result is a high probability regret upper bound of $\tilde{O}(D\sqrt{SAT})$ for any communicating MDP with $S$ states, $A$ actions and diameter $D$, when $T\ge S^5A$. Here, regret compares the total reward achieved by the algorithm to the total expected reward of an optimal infinite-horizon undiscounted average reward policy, in time horizon $T$. This result improves over the best previously known upper bound of $\tilde{O}(DS\sqrt{AT})$ achieved by any algorithm in this setting, and matches the dependence on $S$ in the established lower bound of $\Omega(\sqrt{DSAT})$ for this problem. Our techniques involve proving some novel results about the anti-concentration of Dirichlet distribution, which may be of independent interest.

artificial intelligence, machine learning, proceedings, (8 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

f69041d874533096748e2d77480c1fea-AuthorFeedback.pdf

Neural Information Processing SystemsFeb-15-2026, 03:50:21 GMT

algorithm, efficiency, reward function, (9 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.34)

Add feedback

Nonparametric Contextual Bandits in Metric Spaces with Unknown Metric

Nirandika Wanigasekara, Christina Yu

Neural Information Processing SystemsFeb-13-2026, 13:36:39 GMT

Neural Information Processing Systems http://nips.cc/

algorithm, bandit, reward function, (15 more...)

Neural Information Processing Systems

Country:

Asia > Singapore (0.04)
North America > Canada (0.04)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Supervised Learning > Representation Of Examples (0.51)
Information Technology > Data Science > Data Mining > Big Data (0.48)

Add feedback

Regret Bounds for Learning State Representations in Reinforcement Learning

Ronald Ortner, Matteo Pirotta, Alessandro Lazaric, Ronan Fruit, Odalric-Ambrym Maillard

Neural Information Processing SystemsFeb-13-2026, 03:37:14 GMT

Neural Information Processing Systems http://nips.cc/

algorithm, markov model, representation, (16 more...)

Neural Information Processing Systems

Country:

North America > United States (0.04)
North America > Canada (0.04)
Europe > France > Hauts-de-France > Pas-de-Calais (0.04)
Europe > Austria > Styria > Leoben (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.66)

Add feedback

NearOptimalExploration-Exploitationin Non-CommunicatingMarkovDecisionProcesses

Neural Information Processing SystemsFeb-12-2026, 15:41:04 GMT

Reinforcement learning (RL) [1] studies the problem of learning in sequential decision-making problems where the dynamics of the environment is unknown, but can be learnt by performing actions andobserving their outcome inanonline fashion. Asample-efficient RLagent must trade off the explorationneeded to collect information about the environment, and theexploitation of the experience gathered so far to gain as much reward as possible.

artificial intelligence, machine learning, reinforcement learning, (18 more...)

Neural Information Processing Systems

Country: