Improving Deep Neuroevolution via Deep Innovation Protection

Risi, Sebastian, Stanley, Kenneth O.

arXiv.org Machine Learning 

A BSTRACT Evolutionary-based optimization approaches have recently shown promising results in domains such as Atari and robot locomotion but less so in solving 3D tasks directly from pixels. This paper presents a method called Deep Innovation Protection (DIP) that allows training complex world models end-to-end for such 3D environments. The main idea behind the approach is to employ multiobjective optimization to temporally reduce the selection pressure on specific components in a world model, allowing other components to adapt. We investigate the emergent representations of these evolved networks, which learn a model of the world without the need for a specific forward-prediction loss. 1 I NTRODUCTION The ability of the brain to model the world arose from the process of evolution. It evolved because it helped organisms to survive and strive in their particular environments and not because such forward prediction was explicitly optimized for. In contrast to the emergent neural representations in nature, current world model approaches are often directly rewarded for their ability to predict future states of the environment (Schmidhuber, 1990; Ha & Schmidhuber, 2018; Hafner et al., 2018; Wayne et al., 2018). While it is undoubtedly useful to be able to explicitly encourage a model to predict what will happen next, in this paper we are interested in what type of representations can emerge from the less directed process of artificial evolution and what ingredients might be necessary to encourage the emergence of such predictive abilities. In particular, we are building on the recently introduced world model architecture introduced by Ha & Schmidhuber (2018). This agent model contains three different components: (1) a visual module, mapping high-dimensional inputs to a lower-dimensional representative code, (2) an LSTM-based memory component, and (3) a controller component that takes input from the visual and memory module to determine the agent's next action.

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found