bd96a50dfd2314e48787581840a07a1a-Supplemental-Datasets_and_Benchmarks_Track.pdf

Neural Information Processing Systems 

We use prompts to LLMs to act as language tools for two types of tasks in our work. The first being to798 read through and retrieve the relevant information from news articles to caption our image sequences,799 figures 6 and 7 The second being utilizing our captions to generate event specific question-answer800 pairs, figures 8 and 9.801 We conducted human validation on 144 events sampled across 15 disaster types to assess caption803 quality. Human evaluators were asked to classify each event as: (1) clear alignment between images,804 captions, and sources, (2) mismatch, or (3) inconclusive where imagery was insufficient to verify805 caption details. Overall results showed 65.3% clear alignment between images, captions, and sources,806 18.8% had mismatches, and 16.0% were inconclusive where imagery was insufficient to verify807 caption details. Excluding inconclusive cases, 77.7% of determinable events showed alignment,808 demonstrating reasonable caption quality for LLM-generated annotations.809