Anytime-Competitive Reinforcement Learning with Policy Prior

Neural Information Processing Systems 

In contrast, the goal of A-CMDP is to optimize the expected reward while guaranteeing a bounded cost in each round of any episode against a policy prior.

Similar Docs  Excel Report  more

TitleSimilaritySource
None found