An information-theoretic perspective on intrinsic motivation in reinforcement learning: a survey

Aubret, Arthur, Matignon, Laetitia, Hassas, Salima

arXiv.org Artificial Intelligence 

Traditionally, an agent maximizes a reward defined according to the task to perform: it may be a score when the agent learns to solve a game or a distance function when the agent learns to reach a goal. The reward is then considered as extrinsic (or as a feedback) because the reward function is provided expertly and specifically for the task. With an extrinsic reward, many spectacular results have been obtained on Atari game [Bellemare et al. 2015] with the Deep Q-network (DQN) [Mnih et al. 2015] through the integration of deep learning to RL, leading to deep reinforcement learning (DRL). However, despite the recent improvements of DRL approaches, they turn out to be most of the time unsuccessful when the rewards are scattered in the environment, as the agent is then unable to learn the desired behavior for the targeted task [Francois-Lavet et al. 2018]. Moreover, the behaviors learned by the agent are hardly reusable, both within the same task and across many different tasks [Francois-Lavet et al. 2018]. It is difficult for an agent to generalize the learnt skills to make high-level decisions in the environment. For example, such skill could be go to the door using primitive actions consisting in moving in the four cardinal directions; or even to move forward controlling different joints of a humanoid robot like in the robotic simulator MuJoCo [Todorov et al. 2012]. On another side, unlike RL, developmental learning [Cangelosi and Schlesinger 2018; Oudeyer and Smith 2016; Piaget and Cook 1952] is based on the trend that babies, or more broadly organisms, acquire new skill while spontaneously exploring their environment [Barto 2013; Gopnik et al. 1999].

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found