Goto

Collaborating Authors

 plastic weight


Learning To Play Atari Games Using Dueling Q-Learning and Hebbian Plasticity

Salehin, Md Ashfaq

arXiv.org Artificial Intelligence

In this work, an advanced deep reinforcement learning architecture is used to train neural network agents playing atari games. Given only the raw game pixels, action space, and reward information, the system can train agents to play any Atari game. At first, this system uses advanced techniques like deep Q-networks and dueling Q-networks to train efficient agents, the same techniques used by DeepMind to train agents that beat human players in Atari games. As an extension, plastic neural networks are used as agents, and their feasibility is analyzed in this scenario. The plasticity implementation was based on backpropagation and the Hebbian update rule. Plastic neural networks have excellent features like lifelong learning after the initial training, which makes them highly suitable in adaptive learning environments. As a new analysis of plasticity in this context, this work might provide valuable insights and direction for future works. Einforcement learning is a computational technique where an agent learns by directly interacting with its environment without having a complete model of the environment [1]. Reinforcement learning is a very good example of adaptive systems where an agent learns to make decisions and take actions in an environment in order to maximize some reward, which acts as feedback from the environment to the agent. Well-crafted reinforcement learning agents with optimized training loops are known to learn complex tasks, such as playing computer games. In previous work, a CNN-based agent was trained using discounted policy gradients, where all the rewards in an episode were fed to the agent as training data after discounting by a factor [2]. Although this approach served as a good starting point, it is not suitable for learning to control complex environments, such as Atari games. A better implementation is possible using the Q-Learning algorithm, which is based on the Bellman equation [3]. The Bellman equation is based on the Markov decision process [4] and states that the optimal value of a state is equal to the immediate reward plus the discounted expected optimal value of the next state under the optimal policy. While the Bellman equation requires all the reward values and transition probabilities to be known in advance, the Q-Learning algorithm uses Q-Values, which are initialized as random values and optimized gradually.


Learning to acquire novel cognitive tasks with evolution, plasticity and meta-meta-learning

Miconi, Thomas

arXiv.org Artificial Intelligence

In one In meta-learning, networks are trained with external method, the "inner loop" stores information in the algorithms to learn tasks that require acquiring, time-varying activities of a recurrent network, which storing and exploiting unpredictable information for is slowly optimized in the "outer loop" over many each new instance of the task. However, animals are episodes [Hochreiter et al., 2001, Wang et al., 2016, able to pick up such cognitive tasks automatically, Duan et al., 2016]. A biological interpretation of as a result of their evolved neural architecture and this method is that the inner loop represents the synaptic plasticity mechanisms. Here we evolve neural within-episode self-sustaining activity of cerebral cortex, networks, endowed with plastic connections, over while the outer loop represents lifetime sculpting a sizeable set of simple meta-learning tasks based on of neural connections by value-based neural plasticity a framework from computational neuroscience. The (this interpretation is explored in detail by Wang resulting evolved network can automatically acquire et al. [2018]).