Dynamic Regret of Adversarial Linear Mixture MDPs

Neural Information Processing Systems 

We study reinforcement learning in episodic inhomogeneous MDPs with adversarial full-information rewards and the unknown transition kernel. We consider the linear mixture MDPs whose transition kernel is a linear mixture model and choose the dynamic regret as the performance measure.

Similar Docs  Excel Report  more

TitleSimilaritySource
None found