Reviewer

Neural Information Processing Systems 

No, they are usually different. We will try to make this clearer. The inner optimisation is nested within the outer optimisation. We will try to make this clearer too. The MLP baseline uses Eq. 1 and 2, just like in existing work that uses MLPs to predict sets.