Settling the Horizon-Dependence of Sample Complexity in Reinforcement Learning

Li, Yuanzhi, Wang, Ruosong, Yang, Lin F.

Oct-31-2021–arXiv.org Artificial Intelligence

Reinforcement learning (RL) is one of the most important paradigms in machine learning. What makes RL different from other paradigms is that it models the long-term effects in decision-making problems. For instance, in a finite-horizon Markov decision process (MDP), which is one of the most fundamental models for RL, an agent interacts with the environment for a total of H steps and receives a sequence of H random reward values, along with stochastic state transitions, as feedback. The goal of the agent is to find a policy to maximize the expected sum of these rewards values instead of any single one of them. Since decisions made at early stages could significantly impact the future, the agent must take possible future transitions into consideration when choosing the policy. On the other hand, when H 1, RL reduces to the contextual bandits problem in which it suffices to act myopically to achieve optimality. Due to the important role of the horizon length in RL, Jiang and Agarwal [JA18] propose to study how the sample complexity of RL depends on the horizon length. More formally, let us consider the episodic RL setting, where the horizon length is H and the underlying MDP has unknown and time invariant transition probabilities and rewards.

algorithm, probability, sample complexity, (12 more...)

arXiv.org Artificial Intelligence

Oct-31-2021

arXiv.org PDF

Add feedback

Country:
- North America > United States
  - Pennsylvania > Allegheny County
    - Pittsburgh (0.04)
  - California > Los Angeles County
    - Long Beach (0.04)
- Europe > United Kingdom
  - England > Greater London > London (0.04)
- Asia > Middle East
  - Jordan (0.04)

Genre:
- Research Report > New Finding (0.46)

Technology:
- Information Technology > Artificial Intelligence > Machine Learning
  - Reinforcement Learning (0.85)
  - Learning Graphical Models > Undirected Networks
    - Markov Models (0.49)