AITopics | dsat

Regret Minimization for Reinforcement Learning by Evaluating the Optimal Bias Function

Neural Information Processing SystemsDec-25-2025, 19:13:59 GMT

We present an algorithm based on the \emph{Optimism in the Face of Uncertainty} (OFU) principle which is able to learn Reinforcement Learning (RL) modeled by Markov decision process (MDP) with finite state-action space efficiently. By evaluating the state-pair difference of the optimal bias function $h^{*}$, the proposed algorithm achieves a regret bound of $\tilde{O}(\sqrt{SATH})$\footnote{The symbol $\tilde{O}$ means $O$ with log factors ignored.

name change, regret minimization, reinforcement learning, (9 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.31)

Add feedback

Regret Minimization for Reinforcement Learning by Evaluating the Optimal Bias Function

Neural Information Processing SystemsOct-10-2024, 14:59:51 GMT

We present an algorithm based on the \emph{Optimism in the Face of Uncertainty} (OFU) principle which is able to learn Reinforcement Learning (RL) modeled by Markov decision process (MDP) with finite state-action space efficiently. By evaluating the state-pair difference of the optimal bias function h {*}, the proposed algorithm achieves a regret bound of \tilde{O}(\sqrt{SATH}) \footnote{The symbol \tilde{O} means O with log factors ignored. Furthermore, this regret bound matches the lower bound of \Omega(\sqrt{SATH}) \cite{jaksch2010near} up to a logarithmic factor. As a consequence, we show that there is a near optimal regret bound of \tilde{O}(\sqrt{DSAT}) for MDPs with finite diameter D compared to the lower bound of \Omega(\sqrt{DSAT}) \cite{jaksch2010near}.

optimal bias function, regret minimization, reinforcement learning, (7 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.65)

Add feedback

Near-optimal Regret Bounds for Reinforcement Learning

Neural Information Processing SystemsFeb-16-2024, 13:07:00 GMT

For undiscounted reinforcement learning in Markov decision processes (MDPs) we consider the total regret of a learning algorithm with respect to an optimal policy. In order to describe the transition structure of an MDP we propose a new parameter: An MDP has diameter D if for any pair of states s1,s2 there is a policy which moves from s1 to s2 in at most D steps (on average). We present a reinforcement learning algorithm with total regret O(DSAT) after T steps for any unknown MDP with S states, A actions per state, and diameter D. This bound holds with high probability. We also present a corresponding lower bound of Omega(DSAT) on the total regret of any learning algorithm. Both bounds demonstrate the utility of the diameter as structural parameter of the MDP.

algorithm, near-optimal regret bound, reinforcement learning, (3 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Regret Minimization for Reinforcement Learning by Evaluating the Optimal Bias Function

Zhang, Zihan, Ji, Xiangyang

Neural Information Processing SystemsMar-18-2020, 21:32:22 GMT

We present an algorithm based on the \emph{Optimism in the Face of Uncertainty} (OFU) principle which is able to learn Reinforcement Learning (RL) modeled by Markov decision process (MDP) with finite state-action space efficiently. By evaluating the state-pair difference of the optimal bias function $h {*}$, the proposed algorithm achieves a regret bound of $\tilde{O}(\sqrt{SATH})$\footnote{The symbol $\tilde{O}$ means $O$ with log factors ignored. This result outperforms the best previous regret bounds $\tilde{O}(HS\sqrt{AT})$\cite{bartlett2009regal} by a factor of $\sqrt{SH}$. Furthermore, this regret bound matches the lower bound of $\Omega(\sqrt{SATH})$\cite{jaksch2010near} up to a logarithmic factor. As a consequence, we show that there is a near optimal regret bound of $\tilde{O}(\sqrt{DSAT})$ for MDPs with finite diameter $D$ compared to the lower bound of $\Omega(\sqrt{DSAT})$\cite{jaksch2010near}.

optimal bias function, regret minimization, reinforcement learning, (7 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.65)

Add feedback

Near-optimal Regret Bounds for Reinforcement Learning

Auer, Peter, Jaksch, Thomas, Ortner, Ronald

Neural Information Processing SystemsFeb-15-2020, 01:11:16 GMT

For undiscounted reinforcement learning in Markov decision processes (MDPs) we consider the total regret of a learning algorithm with respect to an optimal policy. In order to describe the transition structure of an MDP we propose a new parameter: An MDP has diameter D if for any pair of states s1,s2 there is a policy which moves from s1 to s2 in at most D steps (on average). We present a reinforcement learning algorithm with total regret O(DSAT) after T steps for any unknown MDP with S states, A actions per state, and diameter D. This bound holds with high probability. We also present a corresponding lower bound of Omega(DSAT) on the total regret of any learning algorithm. Both bounds demonstrate the utility of the diameter as structural parameter of the MDP.

algorithm, near-optimal regret bound, reinforcement learning, (3 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

DSAT - First Ever Adaptive Learning Platform for Data Science

#artificialintelligenceNov-28-2019, 07:38:23 GMT

Every once in a while, a revolutionary product comes along that changes everything.

adaptive learning platform, data science, dsat, (11 more...)

#artificialintelligence

Country:

North America > United States > Pennsylvania (0.05)
North America > United States > Michigan (0.05)
North America > United States > Illinois > Cook County > Chicago (0.05)

Genre: Overview > Innovation (0.36)

Technology:

Information Technology > Data Science (0.75)
Information Technology > Artificial Intelligence > Machine Learning (0.40)

Add feedback

Filters

Collaborating Authors

dsat

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

Regret Minimization for Reinforcement Learning by Evaluating the Optimal Bias Function

Regret Minimization for Reinforcement Learning by Evaluating the Optimal Bias Function

Near-optimal Regret Bounds for Reinforcement Learning

Regret Minimization for Reinforcement Learning by Evaluating the Optimal Bias Function

Near-optimal Regret Bounds for Reinforcement Learning

DSAT - First Ever Adaptive Learning Platform for Data Science