Logistic $Q$-Learning

Bas-Serrano, Joan, Curi, Sebastian, Krause, Andreas, Neu, Gergely

Oct-21-2020–arXiv.org Artificial Intelligence

While REPS is elegantly derived from a principled We propose a new reinforcement learning algorithm linear-programing (LP) formulation of optimal control derived from a regularized linearprogramming in MDPs, it has the serious shortcoming that its faithful formulation of optimal control implementation requires access to the true MDP in MDPs. The method is closely related to for both the policy evaluation and improvement steps, the classic Relative Entropy Policy Search even at deployment time. The usual way to address (REPS) algorithm of Peters et al. (2010), with this limitation is to use an empirical approximation to the key difference that our method introduces the policy evaluation step and to project the policy a Q-function that enables efficient exact from the improvement step into a parametric space model-free implementation. The main (Deisenroth et al., 2013), losing all the theoretical feature of our algorithm (called Q-REPS) is guarantees of REPS in the process.

algorithm, artificial intelligence, optimization problem, (17 more...)

arXiv.org Artificial Intelligence

Oct-21-2020

arXiv.org PDF

Add feedback

Country:
- Europe > United Kingdom
  - England (0.14)
- North America > United States (0.14)

Genre:
- Research Report (0.82)

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning
    - Reinforcement Learning (1.00)
    - Statistical Learning (1.00)
  - Representation & Reasoning > Optimization (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found