Supplemental: A Benchmark for Compositional Text-to-image Retrieval