Mean-Variance Policy Iteration for Risk-Averse Reinforcement Learning

Zhang, Shangtong, Liu, Bo, Whiteson, Shimon

May-27-2020–arXiv.org Artificial Intelligence

We present a mean-variance policy iteration (MVPI) framework for risk-averse control in a discounted infinite horizon MDP. MVPI enjoys great flexibility in that any policy evaluation method and risk-neutral control method can be dropped in for risk-averse control off the shelf, in both on- and off-policy settings. We propose risk-averse TD3 as an example instantiating MVPI, which outperforms vanilla TD3 and many previous risk-averse control methods in challenging Mujoco robot simulation tasks under a risk-aware performance metric. This risk-averse TD3 is the first to introduce deterministic policies and off-policy learning into risk-averse reinforcement learning, both of which are key to the performance boost we show in Mujoco domains. MVPI adopts a per-step reward perspective (Bisi et al., 2019) for risk-averse control, instead of the commonly used total reward perspective.

machine learning, reinforcement learning, variance, (14 more...)

arXiv.org Artificial Intelligence

May-27-2020

arXiv.org PDF

Add feedback

Country:
- North America
  - Canada > Alberta (0.14)
  - United States > Massachusetts
    - Middlesex County > Belmont (0.04)
- Europe > United Kingdom
  - England > Oxfordshire > Oxford (0.04)
- Asia > Middle East
  - Jordan (0.04)

Genre:
- Research Report (0.82)

Industry:
- Information Technology > Security & Privacy (0.46)

Technology:
- Information Technology > Artificial Intelligence
  - Robots (1.00)
  - Machine Learning
    - Reinforcement Learning (1.00)
    - Neural Networks > Deep Learning (0.47)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found