pathnet
Measuring Catastrophic Forgetting in Neural Networks
Kemker, Ronald (Rochester Institute of Technology) | McClure, Marc (Rochester Institute of Technology) | Abitino, Angelina (Swarthmore College) | Hayes, Tyler L. (Rochester Institute of Technology) | Kanan, Christopher (Rochester Institute of Technology)
Deep neural networks are used in many state-of-the-art systems for machine perception. Once a network is trained to do a specific task, e.g., bird classification, it cannot easily be trained to do new tasks, e.g., incrementally learning to recognize additional bird species or learning an entirely different task such as flower recognition. When new tasks are added, typical deep neural networks are prone to catastrophically forgetting previous tasks. Networks that are capable of assimilating new information incrementally, much like how humans form new memories over time, will be more efficient than re-training the model from scratch each time a new task needs to be learned. There have been multiple attempts to develop schemes that mitigate catastrophic forgetting, but these methods have not been directly compared, the tests used to evaluate them vary considerably, and these methods have only been evaluated on small-scale problems (e.g., MNIST). In this paper, we introduce new metrics and benchmarks for directly comparing five different mechanisms designed to mitigate catastrophic forgetting in neural networks: regularization, ensembling, rehearsal, dual-memory, and sparse-coding. Our experiments on real-world images and sounds show that the mechanism(s) that are critical for optimal performance vary based on the incremental training paradigm and type of data being used, but they all demonstrate that the catastrophic forgetting problem is not yet solved.
Pathnet is Deepmind's step to a super neural network for creating an artificial general intelligence
For artificial general intelligence (AGI) it would be efficient if multiple users trained the same giant neural network, permitting parameter reuse, without catastrophic forgetting. PathNet is a first step in this direction. It is a neural network algorithm that uses agents embedded in the neural network whose task is to discover which parts of the network to re-use for new tasks. Agents are pathways (views) through the network which determine the subset of parameters that are used and updated by the forwards and backwards passes of the backpropogation algorithm. During learning, a tournament selection genetic algorithm is used to select pathways through the neural network for replication and mutation. Pathway fitness is the performance of that pathway measured according to a cost function.
Why Google's DeepMind next-gen machine learning will stay undercover
Google's DeepMind, a London-based artificial intelligence company the search-and-cloud giant acquired in 2014, has been closely associated with Google's quest to build general AI. Earlier this year, with little publicity, DeepMind unveiled what looks like a useful step in that direction. The DeepMind team released a paper describing a neural network approach that would allow automatic "transfer learning," meaning a neural network could reuse what it already "knows" on new problems. If history is any hint, this is one innovation Google will keep close to its chest. Neural networks are a common type of machine learning that mimic, perhaps distantly, how neurons interrelate in the brain to pass information around.
DeepMind's PathNet: A Modular Deep Learning Architecture for AGI – Intuition Machine
Unlike more traditional monolithic DL networks, PathNet reuses a network that consists of many neural networks and trains them to perform multiple tasks. In the authors experiments, they have shown that a network trained on a second task learns faster than if the network was trained from scratch. This indicates that transfer learning (or knowledge reuse) can be leveraged in this kind of a network. PathNet includes aspects of transfer learning, continual learning and multitask learning. These are aspects that are essential for a more continuously adaptive network and thus an approach that may lead to an AGI (speculative).
DeepMind just published a mind blowing paper: PathNet.
Each of those nine boxes is the PathNet at a different iteration. In this case, PathNet was trained on two different games using a Advantage Actor-critic or A3C. Although Pong and Alien seem very different at first, we observe a positive transfer learning using PathNet (take a look at the score graph). First of all, we need to define the modules. Let L be the number of layers and N be the maximum number of modules per layer (the paper indicates that N is typically 3 or 4).
DeepMind just published a mind blowing paper: PathNet
Potentially describing how general artificial intelligence will look like. Since scientists started building and training neural networks, Transfer Learning has been the main bottleneck. Transfer Learning is the ability of an AI to learn from different tasks and apply its pre-learned knowledge to a completely new task. It is implicit that with this precedent knowledge, the AI will perform better and train faster than de novo neural networks on the new task. DeepMind is on the path of solving this with PathNet.
DeepMind just published a mind blowing paper: PathNet.
Each of those nine boxes is the PathNet at a different iteration. In this case, the PathNet was trained on two different games using a Advantage Actor-critic or A3C. Although Pong and Alien seem very different at first, we observe a positive transfer learning using PathNet (take a look at the score graph). First of all, we need to define the modules. Let L be the number of layers and N be the maximum number of modules per layer (the paper indicates that N is typically 3 or 4).
DeepMind just published a mind blowing paper: PathNet.
Each of those nine boxes is the PathNet at a different iteration. In this case, the PathNet was trained on two different games using a Advantage Actor-critic or A3C. Although Pong and Alien seem very different at first, we observe a positive transfer learning using PathNet (take a look at the score graph). First of all, we need to define the modules. Let L be the number of layers and N be the maximum number of modules per layer (the paper indicates that N is typically 3 or 4).