Near-Optimal Dynamic Regret for Adversarial Linear Mixture MDPs

Oct-10-2025, 04:39:22 GMT–Neural Information Processing Systems

The interaction is usually modeled as Markov Decision Processes (MDPs). Research on MDPs can be broadly divided into two lines based on the reward generation mechanism. The first line of work [Jaksch et al., 2010, Azar et al., 2013, 2017, He et al., 2021] considers the

algorithm, dynamic regret, linear mixture mdp, (15 more...)

Neural Information Processing Systems

Oct-10-2025, 04:39:22 GMT

Conferences PDF

Add feedback

Country:
- Europe > United Kingdom
  - England > Cambridgeshire > Cambridge (0.04)
- Asia
  - Middle East > Jordan (0.04)
  - China > Jiangsu Province
    - Nanjing (0.04)

Genre:
- Research Report > Experimental Study (0.93)

Industry:
- Education (0.46)

Technology:
- Information Technology > Artificial Intelligence
  - Representation & Reasoning (1.00)
  - Machine Learning
    - Reinforcement Learning (0.47)
    - Learning Graphical Models > Undirected Networks
      - Markov Models (0.34)

Duplicate Docs Excel Report

Title
Near-Optimal Dynamic Regret for Adversarial Linear Mixture MDPs

Similar Docs Excel Report more

Title	Similarity	Source
None found