Reviews: Safe and Efficient Off-Policy Reinforcement Learning

Jan-20-2025, 19:06:13 GMT–Neural Information Processing Systems

In particular, it bounds the performance of off-policy importance sampling as a function of a truncation coefficient, and discusses how to choose that coefficient based on the bound they propose. The lack of a discussion of the relationship to that work makes paper 602 considerably weaker in my opinion. I would still lean towards acceptance, but only as a poster. Analyzing the convergence of the general-form off-policy updates in Equation 4 is novel and important. The theory is limited to finite state spaces (something that should be stated in the abstract) for discounted MDPs, but the empirical results show that the new Retrace algorithm can perform well in conjunction with value function approximation.

coefficient, efficient off-policy reinforcement learning, experimental result, (4 more...)

Neural Information Processing Systems

Jan-20-2025, 19:06:13 GMT

Conferences Web Page

Add feedback

Technology:
- Information Technology > Artificial Intelligence
  - Representation & Reasoning (0.58)
  - Machine Learning > Reinforcement Learning (0.40)