Occupancy-based Policy Gradient: Estimation, Convergence, and Optimality

Oct-9-2025, 16:57:44 GMT–Neural Information Processing Systems

Occupancy functions play an instrumental role in reinforcement learning (RL) for guiding exploration, handling distribution shift, and optimizing general objectives beyond the expected return. Y et, computationally efficient policy optimization methods that use (only) occupancy functions are virtually non-existent. In this paper, we establish the theoretical foundations of model-free policy gradient (PG) methods that compute the gradient through the occupancy for both online and offline RL, without modeling value functions. Our algorithms reduce gradient estimation to squared-loss regression and are computationally oracle-efficient. We characterize the sample complexities of both local and global convergence, accounting for both finite-sample estimation error and the roles of exploration (online) and data coverage (offline). Occupancy-based PG naturally handles arbitrary offline data distributions, and, with one-line algorithmic changes, can be adapted to optimize any differentiable objective functional.

asm, gradient, lem, (17 more...)

Neural Information Processing Systems

Oct-9-2025, 16:57:44 GMT

Conferences PDF

Add feedback

Country:
- Asia > Middle East
  - Jordan (0.04)
- North America > United States
  - Illinois > Champaign County
    - Champaign (0.04)
    - Urbana (0.04)

Genre:
- Research Report > Experimental Study (0.92)

Technology:
- Information Technology
  - Artificial Intelligence
    - Machine Learning
      - Learning Graphical Models > Directed Networks
        Bayesian Learning (0.46)
      - Reinforcement Learning (1.00)
    - Representation & Reasoning > Uncertainty
      - Bayesian Inference (0.46)
  - Data Science (0.87)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found