Goto

Collaborating Authors

 Reinforcement Learning


K-level Reasoning for Zero-Shot Coordination in Hanabi

Neural Information Processing Systems

Work done while at Facebook AI Research 35th Conference on Neural Information Processing Systems (NeurIPS 2021) Figure 1: Visualization of various hierarchical training schemas, including sequential KLR, synchronous KLR, synchronous CH, and our new SyKLRBR for 4 levels.






A Provably Efficient Sample Collection Strategy for Reinforcement Learning

Neural Information Processing Systems

One of the challenges in online reinforcement learning (RL) is that the agent needs to trade off the exploration of the environment and the exploitation of the samples to optimize its behavior. Whether we optimize for regret, sample complexity, state-space coverage or model estimation, we need to strike a different exploration-exploitation trade-off.


A Provably Efficient Sample Collection Strategy for Reinforcement Learning

Neural Information Processing Systems

One of the challenges in online reinforcement learning (RL) is that the agent needs to trade off the exploration of the environment and the exploitation of the samples to optimize its behavior. Whether we optimize for regret, sample complexity, state-space coverage or model estimation, we need to strike a different exploration-exploitation trade-off.