Reviews: Compositional Plan Vectors

Neural Information Processing Systems 

Summary The paper proposes a new method for better and more efficient generalization to more complex tasks at test time in the setting of one-shot imitation learning. The main idea is to condition the policy on the difference between the embedding of some reference trajectory and the a partial trajectory of the agent (for the same task, but starting from a potentially different state of the environment). Main Comments I found the experimental section to be slightly thin and I would like to see how this method performs on at least another more complex task. It would also be good to include a discussion on the types of environments where we can expect this to perform best and where we can expect it to fail or perform worse than other relevant algorithms. I also think more comparisons with other approaches for one-shot imitation learning (such as Duan et al. 2017) are needed for strengthening the paper.