Reinforcement Learning with History-Dependent Dynamic Contexts
Tennenholtz, Guy, Merlis, Nadav, Shani, Lior, Mladenov, Martin, Boutilier, Craig
–arXiv.org Artificial Intelligence
We introduce Dynamic Contextual Markov Decision Processes (DCMDPs), a novel reinforcement learning framework for history-dependent environments that generalizes the contextual MDP framework to handle non-Markov environments, where contexts change over time. We consider special cases of the model, with a focus on logistic DCMDPs, which break the exponential dependence on history length by leveraging aggregation functions to determine context transitions. This special structure allows us to derive an upper-confidence-bound style algorithm for which we establish regret bounds. Motivated by our theoretical results, we introduce a practical model-based algorithm for logistic DCMDPs that plans in a latent space and uses optimism over history-dependent features. We demonstrate the efficacy of our approach on a recommendation task (using MovieLens data) where user behavior dynamics evolve in response to recommendations.
arXiv.org Artificial Intelligence
May-17-2023
- Country:
- Asia > Macao (0.04)
- North America
- United States
- Oregon > Multnomah County
- Portland (0.04)
- Hawaii > Honolulu County
- Honolulu (0.04)
- Oregon > Multnomah County
- Canada > Quebec
- Montreal (0.04)
- United States
- Europe
- United Kingdom > England
- Cambridgeshire > Cambridge (0.04)
- Italy > Piedmont
- Turin Province > Turin (0.04)
- United Kingdom > England
- Genre:
- Research Report (0.64)
- Industry:
- Leisure & Entertainment (0.45)