Reinforcement Learning with History-Dependent Dynamic Contexts

Tennenholtz, Guy, Merlis, Nadav, Shani, Lior, Mladenov, Martin, Boutilier, Craig

May-17-2023–arXiv.org Artificial Intelligence

We introduce Dynamic Contextual Markov Decision Processes (DCMDPs), a novel reinforcement learning framework for history-dependent environments that generalizes the contextual MDP framework to handle non-Markov environments, where contexts change over time. We consider special cases of the model, with a focus on logistic DCMDPs, which break the exponential dependence on history length by leveraging aggregation functions to determine context transitions. This special structure allows us to derive an upper-confidence-bound style algorithm for which we establish regret bounds. Motivated by our theoretical results, we introduce a practical model-based algorithm for logistic DCMDPs that plans in a latent space and uses optimism over history-dependent features. We demonstrate the efficacy of our approach on a recommendation task (using MovieLens data) where user behavior dynamics evolve in response to recommendations.

artificial intelligence, machine learning, reinforcement learning, (15 more...)

arXiv.org Artificial Intelligence

May-17-2023

arXiv.org PDF

Add feedback

Country:
- North America > United States (0.67)

Genre:
- Research Report (0.64)

Industry:
- Leisure & Entertainment (0.45)

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning
    - Learning Graphical Models > Undirected Networks
      - Markov Models (0.48)
    - Reinforcement Learning (1.00)
  - Representation & Reasoning > Personal Assistant Systems (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found