Goal-Conditioned Generators of Deep Policies
Faccio, Francesco, Herrmann, Vincent, Ramesh, Aditya, Kirsch, Louis, Schmidhuber, Jürgen
Goal-conditioned Reinforcement Learning (RL) aims at learning optimal policies, given goals encoded in special command inputs. Here we study goal-conditioned neural nets (NNs) that learn to generate deep NN policies in form of context-specific weight matrices, similar to Fast Weight Programmers and other methods from the 1990s. Using context commands of the form "generate a policy that achieves a desired expected return," our NN generators combine powerful exploration of parameter space with generalization across commands to iteratively find better and better policies. A form of weight-sharing HyperNetworks and policy embeddings scales our method to generate deep NNs. Experiments show how a single learned policy generator can produce policies that achieve any return seen during training. Finally, we evaluate our algorithm on a set of continuous control tasks where it exhibits competitive performance.
Jul-4-2022
- Country:
- Asia > Middle East
- Saudi Arabia > Mecca Province > Thuwal (0.04)
- Europe > Switzerland (0.04)
- North America > United States
- California > San Diego County
- San Diego (0.04)
- Massachusetts > Middlesex County
- Cambridge (0.04)
- New York > New York County
- New York City (0.04)
- California > San Diego County
- Asia > Middle East
- Genre:
- Research Report (0.64)
- Technology: