Evolved Policy Gradients

Apr-20-2018, 12:30:22 GMT–@machinelearnbot

We're releasing an experimental metalearning approach called Evolved Policy Gradients, a method that evolves the loss function of learning agents, which can enable fast training on novel tasks. Agents trained with EPG can succeed at basic tasks at test time that were outside their training regime, like learning to navigate to an object on a different side of the room from where it was placed during training. EPG trains agents to have a prior notion of what constitutes making progress on a novel task. Rather than encoding prior knowledge through a learned policy network, EPG encodes it as a learned loss function[1]. Agents are then able to use this loss function, defined as a temporal-convolutional neural network, to learn quickly on a novel task. We've shown that EPG can generalize to out of distribution test time tasks, exhibiting behavior qualitatively different from other popular metalearning algorithms.

artificial intelligence, loss function, machine learning, (15 more...)

@machinelearnbot

Apr-20-2018, 12:30:22 GMT

News Web Page

Add feedback

Technology:
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.40)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found