Multi-Reward Best Policy Identification

Neural Information Processing Systems 

This bound guides the design of an optimal exploration policy attaining minimal sample complexity. However, this lower bound involves solving a hard non-convex optimization problem.

Similar Docs  Excel Report  more

TitleSimilaritySource
None found