Solving Continuous Control via Q-learning

Seyde, Tim, Werner, Peter, Schwarting, Wilko, Gilitschenski, Igor, Riedmiller, Martin, Rus, Daniela, Wulfmeier, Markus

Sep-25-2023–arXiv.org Artificial Intelligence

However, recent results have shown that competitive performance can be achieved with strongly reduced, discretized versions of the original action space (Tavakoli et al., 2018; Tang & Agrawal, 2020; Seyde et al., 2021). This opens the question whether tasks with complex high-dimensional action spaces can be solved using simpler critic-only, discrete action-space algorithms instead. A potential candidate is Q-learning which only requires learning a critic with the policy commonly following via ϵ-greedy or Boltzmann exploration (Watkins & Dayan, 1992; Mnih et al., 2013). While naive Q-learning struggles in high-dimensional action spaces due to exponential scaling of possible action combinations, the multi-agent RL literature has shown that factored value function representations in combination with centralized training can alleviate some of these challenges (Sunehag et al., 2017; Rashid et al., 2018), further inspiring transfer to single-agent control settings (Sharma et al., 2017; Tavakoli, 2021). Other methods have been shown to enable application of critic-only agents to continuous action spaces but require additional, costly, sampling-based optimization (Kalashnikov et al., 2018).

artificial intelligence, machine learning, reinforcement learning, (15 more...)

arXiv.org Artificial Intelligence

Sep-25-2023

arXiv.org PDF

Add feedback

Country:
- North America > Canada > Ontario > Toronto (0.14)

Genre:
- Research Report > New Finding (0.67)

Technology:
- Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found