Slowly Changing Adversarial Bandit Algorithms are Efficient for Discounted MDPs

Feb-9-2023–arXiv.org Artificial Intelligence

Reinforcement learning generalizes bandit problems with additional difficulties on longer planning horizon and unknown transition kernel. We show that, under some mild assumptions, *any* slowly changing adversarial bandit algorithm enjoys optimal regret in adversarial bandits can achieve optimal (in the dependency of $T$) expected regret in infinite-horizon discounted MDPs, without the presence of Bellman backups. The slowly changing property required by our generalization is mild, which is also marked by the online Markov decision process literature. We also examine the applicability of our reduction to a well-known adversarial bandit algorithm, EXP3.

data mining, machine learning, reinforcement learning, (13 more...)

arXiv.org Artificial Intelligence

Feb-9-2023

arXiv.org PDF

Add feedback

Country:
- North America > United States
  - Illinois > Cook County > Chicago (0.04)
- Europe > United Kingdom
  - England > Greater London > London (0.04)
- Asia > Middle East
  - Jordan (0.04)

Genre:
- Research Report (0.50)

Technology:
- Information Technology
  - Data Science > Data Mining
    - Big Data (1.00)
  - Artificial Intelligence > Machine Learning
    - Reinforcement Learning (0.90)
    - Learning Graphical Models > Undirected Networks
      - Markov Models (0.48)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found