Reviews: Neural Proximal/Trust Region Policy Optimization Attains Globally Optimal Policy

Jan-22-2025, 07:33:15 GMT–Neural Information Processing Systems

Originality: The authors apply the idea that overparametrization induces local linearization, which has been documented for supervised learning, and in another submission for TD learning. In particular, they decompose the error into two terms, one due to TD, and the other due to SGD, and incorporate them in the analysis of infinite-dimensional mirror descent. The insight that the previous previous analysis for TD could be generalised to a meta algorithm that includes both TD and SGD as particular cases is key. Related work is adequately cited, and differences with previous works are clearly stated, including differences with the sister submission [5]. Quality: The submission seems technically sound, and includes detailed proofs (I just skimmed through them). This is a complete piece of work.

architecture, optimization attain globally optimal policy, submission, (10 more...)

Neural Information Processing Systems

Jan-22-2025, 07:33:15 GMT

Conferences Web Page

Add feedback

Technology:
- Information Technology > Artificial Intelligence > Machine Learning
  - Neural Networks (1.00)
  - Reinforcement Learning (0.77)