Regret Bounds and Reinforcement Learning Exploration of EXP-based Algorithms

Sep-20-2020–arXiv.org Artificial Intelligence

EXP-based algorithms are often used for exploration in multi-armed bandit. We revisit the EXP3.P algorithm and establish both the lower and upper bounds of regret in the Gaussian multi-armed bandit setting, as well as a more general distribution option. The analyses do not require bounded rewards compared to classical regret assumptions. We also extend EXP4 from multi-armed bandit to reinforcement learning to incentivize exploration by multiple agents. The resulting algorithm has been tested on hard-to-explore games and it shows an improvement on exploration compared to state-of-the-art.

data mining, machine learning, reinforcement learning, (20 more...)

arXiv.org Artificial Intelligence

Sep-20-2020

arXiv.org PDF

Add feedback

Country:
- North America > United States
  - Pennsylvania > Allegheny County
    - Pittsburgh (0.04)
  - Illinois > Cook County
    - Evanston (0.04)
  - California > Santa Clara County
    - Palo Alto (0.04)

Genre:
- Research Report (0.82)

Technology:
- Information Technology
  - Data Science > Data Mining
    - Big Data (0.74)
  - Artificial Intelligence > Machine Learning
    - Neural Networks (1.00)
    - Reinforcement Learning (0.85)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found