Goto

Collaborating Authors

 alekhagarwal


6734fa703f6633ab896eecbdfad8953a-Supplemental.pdf

Neural Information Processing Systems

Intheformer (RL),actionstaken at early stages could substantially impact the future; with regards to planning, the agent must not only consider theimmediate rewardbutalso thepossible future transitions into differing states.


0fd489e5e393f61b355be86ed4c24a54-Paper-Conference.pdf

Neural Information Processing Systems

When solving real-world problems, where contexts and actions are complex and high-dimensional (e.g., users' social graph, items' visual description), it is crucial to provide the bandit algorithm with a suitable representation of the context-action space.