Near-Optimal Dynamic Regret for Adversarial Linear Mixture MDPs
–Neural Information Processing Systems
The interaction is usually modeled as Markov Decision Processes (MDPs). Research on MDPs can be broadly divided into two lines based on the reward generation mechanism. The first line of work [Jaksch et al., 2010, Azar et al., 2013, 2017, He et al., 2021] considers the
Neural Information Processing Systems
Mar-21-2025, 17:59:47 GMT