Goto

Collaborating Authors

 Optimization











Multi-Reward Best Policy Identification

Neural Information Processing Systems

This bound guides the design of an optimal exploration policy attaining minimal sample complexity. However, this lower bound involves solving a hard non-convex optimization problem.