Near-Optimal Regret for Adversarial MDP with Delayed Bandit Feedback
–Neural Information Processing Systems
The standard assumption in reinforcement learning (RL) is that agents observe feedback for their actions immediately.
Neural Information Processing Systems
Aug-19-2025, 08:00:09 GMT
- Country:
- Asia > Middle East
- Israel > Tel Aviv District
- Tel Aviv (0.04)
- Jordan (0.04)
- Israel > Tel Aviv District
- Europe > Austria (0.04)
- North America
- Canada > Quebec
- Montreal (0.04)
- United States
- California (0.14)
- Nevada (0.04)
- Canada > Quebec
- Asia > Middle East
- Technology: