Near-Optimal Dynamic Regret for Adversarial Linear Mixture MDPs

Neural Information Processing Systems 

The interaction is usually modeled as Markov Decision Processes (MDPs). Research on MDPs can be broadly divided into two lines based on the reward generation mechanism. The first line of work [Jaksch et al., 2010, Azar et al., 2013, 2017, He et al., 2021] considers the

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found