An Alternative to Variance: Gini Deviation for Risk-averse Policy Gradient

Luo, Yudong, Liu, Guiliang, Poupart, Pascal, Pan, Yangchen

Nov-2-2023–arXiv.org Artificial Intelligence

Restricting the variance of a policy's return is a popular choice in risk-averse Reinforcement Learning (RL) due to its clear mathematical definition and easy interpretability. Traditional methods directly restrict the total return variance. Recent methods restrict the per-step reward variance as a proxy. We thoroughly examine the limitations of these variance-based methods, such as sensitivity to numerical scale and hindering of policy learning, and propose to use an alternative risk measure, Gini deviation, as a substitute. We study various properties of this new risk measure and derive a policy gradient algorithm to minimize it. Empirical evaluation in domains where risk-aversion can be clearly defined, shows that our algorithm can mitigate the limitations of variance-based risk measures and achieves high return with low risk in terms of variance and Gini deviation when others fail to learn a reasonable policy.

gradient, value function, variance, (15 more...)

arXiv.org Artificial Intelligence

Nov-2-2023

arXiv.org PDF

Add feedback

Country:
- North America > Canada
  - Alberta (0.14)
  - Ontario (0.04)
- Europe > United Kingdom
  - England > Oxfordshire > Oxford (0.04)
- Asia > China
  - Guangdong Province > Shenzhen (0.04)
  - Hong Kong (0.04)

Genre:
- Research Report (0.82)

Industry:
- Government (0.46)

Technology:
- Information Technology > Artificial Intelligence
  - Representation & Reasoning > Optimization (0.92)
  - Machine Learning
    - Statistical Learning (0.92)
    - Reinforcement Learning (0.89)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found