consensus re-ranking
We thank all the reviewers for their helpful comments and for recognizing the novelty of our approach (R2-4) and its
We are glad that the reviewers found our experimental setup exhaustive (R1-4). This is not feasible with prior work, e . We will clarify and highlight these challenges in the final version. Random samples in Tab. 2, 11, and 12 show that the captions from COS-CV AE are coherent (ll. COS-CV AE has a score of 0.742 while Seq-CV AE(attn) has 0.714.