Online Episodic Convex Reinforcement Learning

Moreno, Bianca Marin, Eldowa, Khaled, Gaillard, Pierre, Brégère, Margaux, Oudjane, Nadia

May-13-2025–arXiv.org Machine Learning

We study online learning in episodic finite-horizon Markov decision processes (MDPs) with convex objective functions, known as the concave utility reinforcement learning (CURL) problem. This setting generalizes RL from linear to convex losses on the state-action distribution induced by the agent's policy. The non-linearity of CURL invalidates classical Bellman equations and requires new algorithmic approaches. We introduce the first algorithm achieving near-optimal regret bounds for online CURL without any prior knowledge on the transition function. To achieve this, we use an online mirror descent algorithm with varying constraint sets and a carefully designed exploration bonus. We then address for the first time a bandit version of CURL, where the only feedback is the value of the objective function on the state-action distribution induced by the agent's policy. We achieve a sub-linear regret bound for this more challenging problem by adapting techniques from bandit convex optimization to the MDP setting.

artificial intelligence, machine learning, reinforcement learning, (15 more...)

arXiv.org Machine Learning

May-13-2025

arXiv.org PDF

Add feedback

Country:
- North America > United States
  - Massachusetts > Middlesex County > Cambridge (0.04)
- Europe
  - Spain
    - Canary Islands (0.04)
    - Andalusia > Cádiz Province
      - Cadiz (0.04)
  - Italy > Lombardy
    - Milan (0.04)
  - France
    - Île-de-France > Paris
      - Paris (0.04)
    - Auvergne-Rhône-Alpes > Isère
      - Grenoble (0.04)

Genre:
- Research Report (0.81)
- Workflow (0.67)

Industry:
- Education > Educational Setting (0.34)

Technology:
- Information Technology > Artificial Intelligence > Machine Learning
  - Reinforcement Learning (1.00)
  - Learning Graphical Models > Undirected Networks
    - Markov Models (0.34)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found