Goto

Collaborating Authors

 qkh


Near-OptimalRegretforAdversarialMDPwith DelayedBanditFeedback

Neural Information Processing Systems

The standard assumption in reinforcement learning (RL) is that agents observe feedback for their actions immediately. However, in practice feedback is often observedindelay.