Reviews: Multiple-Step Greedy Policies in Approximate and Online Reinforcement Learning

Oct-7-2024, 08:52:47 GMT–Neural Information Processing Systems

The authors first show a negative result that soft-policy updates using the multi-step greedy policies do not guarantee policy improvement. Then the authors proposed an algorithm that uses cautious soft updates (only update to the kappa greedy policy only when assured to improve, otherwise stay with one-step greedy policy) and show that it converges to the optimal policy. Lastly the authors studied hard updates by extending APIs to multi-step greedy policy setting. Comments: 1. Theorem 2 presents an interesting and surprising result. Though the authors presented the example in the proof sketch, but I wonder if the authors could provide more intuitions behind this? Based on the theorem, for multi-step greedy policy, it seems that h needs to be bigger than 2. So I suspect that h 2 will still work (meaning there could exist small alpha)? Obviously h 1 works, but then why when h 3, the soft-update suddenly stops working unless alpha is exactly equal to 1? I would expect that one would require larger alpha when h gets larger.

approximate and online reinforcement learning, multiple-step greedy policy, oracle, (10 more...)

Neural Information Processing Systems

Oct-7-2024, 08:52:47 GMT

Conferences Web Page

Add feedback

Genre:
- Instructional Material > Online (0.40)

Technology:
- Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.40)