Supplementary Material: Discovering Reinforcement Learning Algorithms Junhyuk Oh Matteo Hessel Wojciech M. Czarnecki Zhongwen Xu Hado van Hasselt Satinder Singh David Silver DeepMind

Oct-2-2025, 00:37:31 GMT–Neural Information Processing Systems

In tabular grid worlds, object locations are randomised across lifetimes but fixed within a lifetime. There are two different action spaces. The other version has only 9 movement actions. The episode terminates after a fixed number of steps (i.e., chain length), which is There is no state aliasing because all states are distinct. We trained LPGs by simulating 960 parallel lifetimes (i.e., batch size for meta-gradients), each of Rectified linear unit (ReLU) was used as activation function throughout the experiment.

component description observation state index, maximum step, number, (8 more...)

Neural Information Processing Systems

Oct-2-2025, 00:37:31 GMT

Conferences PDF

Add feedback

Country:
- North America > Canada (0.04)

Technology:
- Information Technology > Artificial Intelligence > Machine Learning
  - Reinforcement Learning (1.00)
  - Neural Networks > Deep Learning (0.51)

Duplicate Docs Excel Report

Title
0b96d81f0494fde5428c7aea243c9157-Supplemental.pdf

Similar Docs Excel Report more

Title	Similarity	Source
None found