Single-partition adaptive Q-learning

Araújo, João Pedro, Figueiredo, Mário, Botto, Miguel Ayala

Jul-13-2020–arXiv.org Machine Learning

This paper introduces single-partition adaptive Q-learning (SPAQL), an algorithm for model-free episodic reinforcement learning (RL), which adaptively partitions the state-action space of a Markov decision process (MDP), while simultaneously learning a time-invariant policy (i. e., the mapping from states to actions does not depend explicitly on the episode time step) for maximizing the cumulative reward. The trade-off between exploration and exploitation is handled by using a mixture of upper confidence bounds (UCB) and Boltzmann exploration during training, with a temperature parameter that is automatically tuned as training progresses. The algorithm is an improvement over adaptive Q-learning (AQL). It converges faster to the optimal solution, while also using fewer arms. Tests on episodes with a large number of time steps show that SPAQL has no problems scaling, unlike AQL. Based on this empirical evidence, we claim that SPAQL may have a higher sample efficiency than AQL, thus being a relevant contribution to the field of efficient model-free RL methods.

agent, artificial intelligence, upstream oil & gas, (20 more...)

arXiv.org Machine Learning

Jul-13-2020

arXiv.org PDF

Add feedback

Country:
- Asia (0.14)
- Oceania > Australia (0.14)
- North America > United States
  - Massachusetts (0.14)
  - New York > New York County
    - New York City (0.14)
- Europe
  - Portugal (0.14)
  - France (0.14)

Genre:
- Research Report (0.83)

Industry:
- Energy > Oil & Gas > Upstream (0.88)

Technology:
- Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found