Goto

Collaborating Authors

 behavior policy




31839b036f63806cba3f47b93af8ccb5-Paper.pdf

Neural Information Processing Systems

Offline reinforcement learning (RL) tasks require the agent to learn from a precollected dataset with no further interactions with the environment. Despite the potential tosurpass thebehavioral policies, RL-based methods aregenerally impractical duetothetraining instability andbootstrapping theextrapolation errors, which always require careful hyperparameter tuning via online evaluation.








Optimal Treatment Allocation for Efficient Policy Evaluation in Sequential Decision Making Ting Li

Neural Information Processing Systems

A/B testing is critical for modern technological companies to evaluate the effectiveness of newly developed products against standard baselines. This paper studies optimal designs that aim to maximize the amount of information obtained from online experiments to estimate treatment effects accurately.