Supplementary Material: Discovering Reinforcement Learning Algorithms Junhyuk Oh Matteo Hessel Wojciech M. Czarnecki Zhongwen Xu Hado van Hasselt Satinder Singh David Silver DeepMind
–Neural Information Processing Systems
In tabular grid worlds, object locations are randomised across lifetimes but fixed within a lifetime. There are two different action spaces. The other version has only 9 movement actions. The episode terminates after a fixed number of steps (i.e., chain length), which is There is no state aliasing because all states are distinct. We trained LPGs by simulating 960 parallel lifetimes (i.e., batch size for meta-gradients), each of Rectified linear unit (ReLU) was used as activation function throughout the experiment.
Neural Information Processing Systems
Oct-2-2025, 00:37:31 GMT