Variational Policy Gradient Method for Reinforcement Learning with General Utilities

Feb-8-2026, 00:04:46 GMT–Neural Information Processing Systems

In recent years, reinforcement learning (RL) systems with general goals beyond a cumulative sum of rewards have gained traction, such as in constrained problems, exploration, and acting upon prior experiences. In this paper, we consider policy optimization in Markov Decision Problems, where the objective is a general concave utility function of the state-action occupancy measure, which subsumes several of the aforementioned examples as special cases.

artificial intelligence, machine learning, reinforcement learning, (15 more...)

Neural Information Processing Systems

Feb-8-2026, 00:04:46 GMT

Conferences PDF

Add feedback

Country:
- North America
  - United States
    - New Jersey > Mercer County
      - Princeton (0.04)
    - Maryland > Prince George's County
      - Adelphi (0.04)
  - Canada
    - Alberta (0.14)
    - British Columbia > Metro Vancouver Regional District
      - Vancouver (0.04)
- Europe > United Kingdom
  - England > Cambridgeshire > Cambridge (0.04)

Industry:
- Government (0.46)

Technology:
- Information Technology > Artificial Intelligence
  - Representation & Reasoning > Optimization (1.00)
  - Machine Learning > Reinforcement Learning (1.00)

Duplicate Docs Excel Report

Title
30ee748d38e21392de740e2f9dc686b6-Paper.pdf

Similar Docs Excel Report more

Title	Similarity	Source
None found