Export Reviews, Discussions, Author Feedback and Meta-Reviews

Oct-2-2025, 19:52:35 GMT–Neural Information Processing Systems

First provide a summary of the paper, and then address the following criteria: Quality, clarity, originality and significance. This paper derives policy gradient algorithms for risk-sensitive MDPs for the particular criterion CVaR - a recent and popular criterion. First, the author derive gradients for the objective based on a Lagrangian relaxation of the constrained optimization. This naturally turns into a policy gradient algorithm where the expected return that appears in the gradient is estimated from full trajectories (reinforce-like). They then propose a scheme to obtain incremental actor-critic versions, where the critic computes the value (and other quantities) of an augmented MDP convenient for gradient estimation.

algorithm, contribution, experiment, (11 more...)

Neural Information Processing Systems

Oct-2-2025, 19:52:35 GMT

Conferences Web Page

Add feedback

Country:
- North America > Canada > Quebec > Montreal (0.04)

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning (0.97)
  - Representation & Reasoning > Optimization (0.35)