Combining Experience Replay with Exploration by Random Network Distillation

May-18-2019–arXiv.org Machine Learning

Abstract--Our work is a simple extension of the paper "Exploration by Random Network Distillation"[1]. Among them we cite the "exploration Our work is a simple extension of PPO/RND. We show how to I. INTRODUCTION We are able to do it by the effects of its actions (in the environment) while trying using a new technique named Prioritized Oversampled Experience to maximize a cumulative return/reward. In other words, a RL Replay (POER), that has been built upon the definition of agent learns how to optimally interact with the environment, by what is the important experience useful to replay. In POER we receiving some environmental feedbacks called rewards. The mix oversampling [3] with experience prioritization [4], trying more an action is good, the higher should be the reward. But to achieve the goal of an optimal balance between exploration in many scenarios, rewards are very rare and difficult to get, and exploitation. In order to do this, we: thus making Reinforcement Learning very ...

artificial intelligence, machine learning, reinforcement learning, (14 more...)

arXiv.org Machine Learning

May-18-2019

arXiv.org PDF

Add feedback

Country:
- Europe > Italy (0.14)

Genre:
- Research Report (0.40)

Industry:
- Leisure & Entertainment > Games (0.70)

Technology:
- Information Technology > Artificial Intelligence
  - Representation & Reasoning (1.00)
  - Machine Learning > Reinforcement Learning (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found