Solving General-Utility Markov Decision Processes in the Single-Trial Regime with Online Planning

Santos, Pedro P., Sardinha, Alberto, Melo, Francisco S.

May-22-2025–arXiv.org Artificial Intelligence

In this work, we contribute the first approach to solve infinite-horizon discounted general-utility Markov decision processes (GUMDPs) in the single-trial regime, i.e., when the agent's performance is evaluated based on a single trajectory. First, we provide some fundamental results regarding policy optimization in the single-trial regime, investigating which class of policies suffices for optimality, casting our problem as a particular MDP that is equivalent to our original problem, as well as studying the computational hardness of policy optimization in the single-trial regime. Second, we show how we can leverage online planning techniques, in particular a Monte-Carlo tree search algorithm, to solve GUMDPs in the single-trial regime. Third, we provide experimental results showcasing the superior performance of our approach in comparison to relevant baselines.

gumdp, machine learning, reinforcement learning, (17 more...)

arXiv.org Artificial Intelligence

May-22-2025

arXiv.org PDF

Add feedback

Country:
- Europe (0.28)
- South America > Brazil (0.28)

Genre:
- Research Report > New Finding (0.67)

Technology:
- Information Technology > Artificial Intelligence
  - Representation & Reasoning
    - Planning & Scheduling (1.00)
    - Search (0.86)
  - Machine Learning
    - Reinforcement Learning (0.68)
    - Learning Graphical Models > Undirected Networks
      - Markov Models (0.84)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found