Regret Minimization for Reinforcement Learning by Evaluating the Optimal Bias Function

Dec-25-2025, 19:13:59 GMT–Neural Information Processing Systems

We present an algorithm based on the \emph{Optimism in the Face of Uncertainty} (OFU) principle which is able to learn Reinforcement Learning (RL) modeled by Markov decision process (MDP) with finite state-action space efficiently. By evaluating the state-pair difference of the optimal bias function $h^{*}$, the proposed algorithm achieves a regret bound of $\tilde{O}(\sqrt{SATH})$\footnote{The symbol $\tilde{O}$ means $O$ with log factors ignored.

name change, regret minimization, reinforcement learning, (9 more...)

Neural Information Processing Systems

Dec-25-2025, 19:13:59 GMT

Conferences Web Page

Add feedback

Technology:
- Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.31)