Near-OptimalRegretforAdversarialMDPwith DelayedBanditFeedback

Neural Information Processing Systems 

The standard assumption in reinforcement learning (RL) is that agents observe feedback for their actions immediately. However, in practice feedback is often observedindelay.

Similar Docs  Excel Report  more

TitleSimilaritySource
None found