Near-Optimal Dynamic Regret for Adversarial Linear Mixture MDPs
–Neural Information Processing Systems
The interaction is usually modeled as Markov Decision Processes (MDPs). Research on MDPs can be broadly divided into two lines based on the reward generation mechanism. The first line of work [Jaksch et al., 2010, Azar et al., 2013, 2017, He et al., 2021] considers the
Neural Information Processing Systems
Oct-10-2025, 04:39:22 GMT
- Country:
- Asia
- China > Jiangsu Province
- Nanjing (0.04)
- Middle East > Jordan (0.04)
- China > Jiangsu Province
- Europe > United Kingdom
- England > Cambridgeshire > Cambridge (0.04)
- Asia
- Genre:
- Research Report > Experimental Study (0.93)
- Industry:
- Education (0.46)