A Details on meta-RL experiments

Katelyn Gao

Neural Information Processing Systems 

A.1 Setup Environments We consider four robotic locomotion and four manipulation environments, all with continuous action spaces. The robotic locomotion environments, based on MuJoCo [27] and OpenAI Gym [3], fall into two categories. Varying reward functions: HalfCheetahRandVel, Walker2DRandVel The HalfCheetahRandVel environment was introduced in Finn et al. [9]. The distribution of tasks is a distribution of HalfCheetah robots with different goal velocities, and remains the same for meta-training and meta-testing. The Walker2DRandVel environment, defined similarly to HalfCheetahRandVel, is found in the codebase for Rothfuss et al. [21].