dd77279f7d325eec933f05b1672f6a1f-Reviews.html
–Neural Information Processing Systems
Summary The paper is about the proposal of a class of constrained natural actor critics, where, for safety reasons, policy parameters must remain in a subregion. The idea is to apply natural actor critic algorithms, that update policy parameters by following the estimated direction of the natural policy gradient and, whenever the policy parameters get out of the safe region, the parameters are projected back to allowed values. The authors show that natural gradient ascent is a particular case of mirror ascent, and, being the latter a constrained optimization algorithm, the projection can be simply (and effectively) obtained by adding constraints to the policy parameters values. Besides theoretically proving that the resulting projection is compatible with the natural policy gradient, a simple example and two more complex case studies have been introduced to evaluate the performance of the proposed solution and the negative effects that can derive in critical systems when either unconstrained optimization or a wrong projection method are used. Quality The paper is technically sound.
Neural Information Processing Systems
Mar-13-2024, 21:24:24 GMT
- Technology: