Dynamic Regret of Policy Optimization in Non-Stationary Environments
–Neural Information Processing Systems
We consider reinforcement learning (RL) in episodic MDPs with adversarial full-information reward feedback and unknown fixed transition kernels.
Neural Information Processing Systems
Nov-13-2025, 23:43:15 GMT