Dynamic Regret of Adversarial Linear Mixture MDPs
–Neural Information Processing Systems
We study reinforcement learning in episodic inhomogeneous MDPs with adversarial full-information rewards and the unknown transition kernel. We consider the linear mixture MDPs whose transition kernel is a linear mixture model and choose the dynamic regret as the performance measure.
Neural Information Processing Systems
Feb-16-2026, 21:18:10 GMT