Explicit Planning for Efficient Exploration in Reinforcement Learning
Liangpeng Zhang, Ke Tang, Xin Yao
–Neural Information Processing Systems
Efficient exploration is crucial to achieving good performance in reinforcement learning. Existing systematic exploration strategies (R-MAX, MBIE, UCRL, etc.), despite being promising theoretically, are essentially greedy strategies that follow some predefined heuristics. When the heuristics do not match the dynamics of Markov decision processes (MDPs) well, an excessive amount of time can be wasted in travelling through already-explored states, lowering the overall efficiency. We argue that explicit planning for exploration can help alleviate such a problem, and propose a Value Iteration for Exploration Cost (VIEC) algorithm which computes the optimal exploration scheme by solving an augmented MDP.
Neural Information Processing Systems
May-23-2025, 16:02:34 GMT
- Country:
- Europe > United Kingdom
- England > West Midlands > Birmingham (0.40)
- North America > United States (1.00)
- Europe > United Kingdom