Solving the scalarization issues of Advantage-based Reinforcement Learning Algorithms

Galatolo, Federico A., Cimino, Mario G. C. A., Vaglini, Gigliola

Apr-8-2020–arXiv.org Machine Learning

In this paper we investigate some of the issues that arise from the scalarization of the multi-objective optimization problem in the Advantage Actor Critic (A2C) reinforcement learning algorithm. We show how a naive scalarization leads to gradients overlapping and we also argue that the entropy regularization term just inject uncontrolled noise into the system. We propose two methods: one to avoid gradient overlapping (NOG) but keeping the same loss formulation; and one to avoid the noise injection (TE) but generating action distributions with a desired entropy. A comprehensive pilot experiment has been carried out showing how using our proposed methods speeds up the training of 210%. We argue how the proposed solutions can be applied to all the Advantage based reinforcement learning algorithms.

algorithm, hyperparameter optimization, optimization, (13 more...)

arXiv.org Machine Learning

Apr-8-2020

arXiv.org PDF

Add feedback

Country:
- Europe > Italy > Tuscany > Pisa Province > Pisa (0.04)

Genre:
- Research Report (0.65)

Technology:
- Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found