Near-Optimal Regret for Adversarial MDP with Delayed Bandit Feedback

Aug-19-2025, 08:00:09 GMT–Neural Information Processing Systems

The standard assumption in reinforcement learning (RL) is that agents observe feedback for their actions immediately.

artificial intelligence, machine learning, reinforcement learning, (16 more...)

Neural Information Processing Systems

Aug-19-2025, 08:00:09 GMT

Conferences PDF

Country:
- Europe > Austria (0.04)
- North America
  - United States
    - California (0.14)
    - Nevada (0.04)
  - Canada > Quebec
    - Montreal (0.04)
- Asia > Middle East
  - Jordan (0.04)
  - Israel > Tel Aviv District
    - Tel Aviv (0.04)

Technology:
- Information Technology > Artificial Intelligence
  - Representation & Reasoning
    - Agents (0.45)
    - Uncertainty (0.45)
  - Machine Learning
    - Reinforcement Learning (0.66)
    - Learning Graphical Models (0.46)

Duplicate Docs Excel Report

Title
Near-OptimalRegretforAdversarialMDPwith DelayedBanditFeedback

Similar Docs Excel Report more

Title	Similarity	Source
None found