A Nearly Optimal and Low-Switching Algorithm for Reinforcement Learning with General Function Approximation

Mar-26-2025, 20:08:57 GMT–Neural Information Processing Systems

The exploration-exploitation dilemma has been a central challenge in reinforcement learning (RL) with complex model classes. In this paper, we propose a new algorithm, Monotonic Q-Learning with Upper Confidence Bound (MQL-UCB) for RL with general function approximation. Our key algorithmic design includes (1) a general deterministic policy-switching strategy that achieves low switching cost, (2) a monotonic value function structure with carefully controlled function class complexity, and (3) a variance-weighted regression scheme that exploits historical trajectories with high data efficiency. MQL-UCB achieves minimax optimal regret of Õ(d HK) when K is sufficiently large and near-optimal policy switching cost of Õ(dH), with d being the eluder dimension of the function class, H being the planning horizon, and K being the number of episodes.

inequality hold, machine learning, reinforcement learning, (14 more...)

Neural Information Processing Systems

Mar-26-2025, 20:08:57 GMT

Conferences PDF

Add feedback

Country:
- North America > United States > California > Los Angeles County > Los Angeles (0.14)

Genre:
- Research Report > Experimental Study (0.92)

Industry:
- Energy > Oil & Gas
  - Upstream (0.34)
- Health & Medicine (0.54)

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning > Reinforcement Learning (1.00)
  - Representation & Reasoning > Uncertainty
    - Fuzzy Logic (0.61)