Chatting Makes Perfect: Chat-based Image Retrieval Supplementary Material

Neural Information Processing Systems 

Figure 5: Dialog examples with human questions answered by two different answerers: Human (green right) and BLIP2 (orange left). Note that human answers tend to be longer, often with voluntarily added information. Figure 6: Screen shot of the web-interface used for collecting human answers.