Goto

Collaborating Authors

 Reinforcement Learning











A Omitted Proofs

Neural Information Processing Systems

The proofs of these propositions are extended from Berlekamp (1968). Note that both oracle's preference feedback and We adopt the environment setting created by Rothfuss et al. (2019). MuJoCo locomotion tasks, where the reward function are varied to create a multi-task setting. The training and testing tasks are randomly generated by a fixed random seed. During meta-training, the meta-RL algorithm has the full access to the environmental interaction.