Best of Both Worlds: Regret Minimization versus Minimax Play

Müller, Adrian, Schneider, Jon, Skoulakis, Stratis, Viano, Luca, Cevher, Volkan

Feb-17-2025–arXiv.org Machine Learning

In this paper, we investigate the existence of online learning algorithms with bandit feedback that simultaneously guarantee $O(1)$ regret compared to a given comparator strategy, and $O(\sqrt{T})$ regret compared to the best strategy in hindsight, where $T$ is the number of rounds. We provide the first affirmative answer to this question. In the context of symmetric zero-sum games, both in normal- and extensive form, we show that our results allow us to guarantee to risk at most $O(1)$ loss while being able to gain $\Omega(T)$ from exploitable opponents, thereby combining the benefits of both no-regret algorithms and minimax play.

artificial intelligence, data mining, machine learning, (17 more...)

arXiv.org Machine Learning

Feb-17-2025

arXiv.org PDF

Add feedback

Country:
- Europe (0.28)

Genre:
- Research Report > New Finding (0.88)

Industry:
- Leisure & Entertainment > Games (0.93)

Technology:
- Information Technology
  - Artificial Intelligence
    - Machine Learning (1.00)
    - Representation & Reasoning > Search (0.60)
  - Data Science > Data Mining
    - Big Data (0.46)
  - Game Theory (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found