Combining Experience Replay with Exploration by Random Network Distillation

Sovrano, Francesco

arXiv.org Machine Learning 

Abstract--Our work is a simple extension of the paper "Exploration by Random Network Distillation"[1]. Among them we cite the "exploration Our work is a simple extension of PPO/RND. We show how to I. INTRODUCTION We are able to do it by the effects of its actions (in the environment) while trying using a new technique named Prioritized Oversampled Experience to maximize a cumulative return/reward. In other words, a RL Replay (POER), that has been built upon the definition of agent learns how to optimally interact with the environment, by what is the important experience useful to replay. In POER we receiving some environmental feedbacks called rewards. The mix oversampling [3] with experience prioritization [4], trying more an action is good, the higher should be the reward. But to achieve the goal of an optimal balance between exploration in many scenarios, rewards are very rare and difficult to get, and exploitation. In order to do this, we: thus making Reinforcement Learning very ...

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found