Perception Test 2024: Challenge Summary and a Novel Hour-Long VideoQA Benchmark
Heyward, Joseph, Carreira, João, Damen, Dima, Zisserman, Andrew, Pătrăucean, Viorica
–arXiv.org Artificial Intelligence
This year, the challenge had seven tracks (up from six last year) and covered low-level and high-level tasks, with language and non-language interfaces, across video, audio, and text modalities; the additional track covered hour-long video understanding and introduced a novel video QA benchmark 1h-walk VQA. Overall, the tasks in the different tracks were: object tracking, point tracking, temporal action localisation, temporal sound localisation, multiple-choice video question-answering, grounded video question-answering, and hour-long video question-answering. We summarise in this report the challenge tasks and results, and introduce in detail the novel hour-long video QA benchmark 1h-walk VQA.
arXiv.org Artificial Intelligence
Nov-29-2024