AITopics | delayed q-learning

Collaborating Authors

delayed q-learning

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Improving Reinforcement Learning Sample-Efficiency using Local Approximation

Prashant, Mohit, Easwaran, Arvind

arXiv.org Artificial IntelligenceJul-17-2025

In this study, we derive Probably Approximately Correct (PAC) bounds on the asymptotic sample-complexity for RL within the infinite-horizon Markov Decision Process (MDP) setting that are sharper than those in existing literature. The premise of our study is twofold: firstly, the further two states are from each other, transition-wise, the less relevant the value of the first state is when learning the $ε$-optimal value of the second; secondly, the amount of 'effort', sample-complexity-wise, expended in learning the $ε$-optimal value of a state is independent of the number of samples required to learn the $ε$-optimal value of a second state that is a sufficient number of transitions away from the first. Inversely, states within each other's vicinity have values that are dependent on each other and will require a similar number of samples to learn. By approximating the original MDP using smaller MDPs constructed using subsets of the original's state-space, we are able to reduce the sample-complexity by a logarithmic factor to $O(SA \log A)$ timesteps, where $S$ and $A$ are the state and action space sizes. We are able to extend these results to an infinite-horizon, model-free setting by constructing a PAC-MDP algorithm with the aforementioned sample-complexity. We conclude with showing how significant the improvement is by comparing our algorithm against prior work in an experimental setting.

artificial intelligence, machine learning, reinforcement learning, (17 more...)

arXiv.org Artificial Intelligence

2507.12383

Country:

North America > United States (0.14)
Asia (0.14)

Genre: Research Report > New Finding (0.66)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

A Hybrid PAC Reinforcement Learning Algorithm

Zehfroosh, Ashkan, Tanner, Herbert G.

arXiv.org Machine LearningSep-5-2020

This paper offers a new hybrid probably asymptotically correct (PAC) reinforcement learning (RL) algorithm for Markov decision processes (MDPs) that intelligently maintains favorable features of its parents. The designed algorithm, referred to as the Dyna-Delayed Q-learning (DDQ) algorithm, combines model-free and model-based learning approaches while outperforming both in most cases. The paper includes a PAC analysis of the DDQ algorithm and a derivation of its sample complexity. Numerical results that support the claim regarding the new algorithm's sample efficiency compared to its parents are showcased in a small grid-world example.

artificial intelligence, machine learning, reinforcement learning, (16 more...)

arXiv.org Machine Learning

2009.02602

Country:

North America > United States > Delaware > New Castle County > Newark (0.14)
Europe > United Kingdom > England > Greater London > London (0.04)

Genre:

Workflow (0.46)
Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.34)

Add feedback

Directed Exploration in PAC Model-Free Reinforcement Learning

Oh, Min-hwan, Iyengar, Garud

arXiv.org Machine LearningAug-30-2018

We study an exploration method for model-free RL that generalizes the counter-based exploration bonus methods and takes into account long term exploratory value of actions rather than a single step look-ahead. We propose a model-free RL method that modifies Delayed Q-learning and utilizes the long-term exploration bonus with provable efficiency. We show that our proposed method finds a near-optimal policy in polynomial time (PAC-MDP), and also provide experimental evidence that our proposed algorithm is an efficient exploration method.

delayed q-learning, machine learning, reinforcement learning, (12 more...)

arXiv.org Machine Learning

1808.10552

Country:

North America > United States > New York > New York County > New York City (0.04)
North America > United States > New Jersey > Middlesex County > New Brunswick (0.04)
Europe > Sweden > Stockholm > Stockholm (0.04)

Genre: Research Report (0.82)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback