Near-Optimal Regret for Adversarial MDP with Delayed Bandit Feedback
–Neural Information Processing Systems
The standard assumption in reinforcement learning (RL) is that agents observe feedback for their actions immediately.
Neural Information Processing Systems
Aug-19-2025, 08:00:09 GMT
- Country:
- Europe > Austria (0.04)
- North America
- United States
- California (0.14)
- Nevada (0.04)
- Canada > Quebec
- Montreal (0.04)
- United States
- Asia > Middle East
- Jordan (0.04)
- Israel > Tel Aviv District
- Tel Aviv (0.04)
- Technology: