Switching the Loss Reduces the Cost in Batch Reinforcement Learning

Ayoub, Alex, Wang, Kaiwen, Liu, Vincent, Robertson, Samuel, McInerney, James, Liang, Dawen, Kallus, Nathan, Szepesvári, Csaba

Mar-12-2024–arXiv.org Artificial Intelligence

In offline reinforcement learning (RL), also known as batch RL, we often want agents that learn how to achieve a goal from a fixed dataset using as few samples as possible. A standard approach in this setting is fitted Q-iteration (FQI) [Ernst et al., 2005], which iteratively minimizes the regression error on the batch dataset. In this work we propose a simple and principled improvement to FQI, using log-loss (FQI-log), and prove that it can achieve a much faster convergence rate. In particular, the number of samples it requires to learn a near-optimal policy scales with the cost of the optimal policy, leading to a so-called small-cost bound, the RL analogue of a small-loss bound in supervised learning. We highlight that FQI-log is the first computationally efficient batch RL algorithm to achieve a small-cost bound.

artificial intelligence, machine learning, reinforcement learning, (10 more...)

arXiv.org Artificial Intelligence

Mar-12-2024

arXiv.org PDF

Add feedback

Country:
- Europe > United Kingdom
  - England > Cambridgeshire > Cambridge (0.14)
- North America > Canada
  - Alberta (0.14)

Genre:
- Research Report > New Finding (0.67)

Technology:
- Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found