Dynamic Regret of Policy Optimization in Non-Stationary Environments

Neural Information Processing Systems 

We consider reinforcement learning (RL) in episodic MDPs with adversarial full-information reward feedback and unknown fixed transition kernels.

Similar Docs  Excel Report  more

TitleSimilaritySource
None found