A Nearly Optimal and Low-Switching Algorithm for Reinforcement Learning with General Function Approximation
–Neural Information Processing Systems
The exploration-exploitation dilemma has been a central challenge in reinforcement learning (RL) with complex model classes. In this paper, we propose a new algorithm, Monotonic Q-Learning with Upper Confidence Bound (MQL-UCB) for RL with general function approximation. Our key algorithmic design includes (1) a general deterministic policy-switching strategy that achieves low switching cost, (2) a monotonic value function structure with carefully controlled function class complexity, and (3) a variance-weighted regression scheme that exploits historical trajectories with high data efficiency. MQL-UCB achieves minimax optimal regret of Õ(d HK) when K is sufficiently large and near-optimal policy switching cost of Õ(dH), with d being the eluder dimension of the function class, H being the planning horizon, and K being the number of episodes.
Neural Information Processing Systems
Mar-26-2025, 20:08:57 GMT
- Country:
- North America > United States > California > Los Angeles County > Los Angeles (0.14)
- Genre:
- Research Report > Experimental Study (0.92)
- Industry:
- Energy > Oil & Gas
- Upstream (0.34)
- Health & Medicine (0.54)
- Energy > Oil & Gas
- Technology: