"Is This It?": Towards Ecologically Valid Benchmarks for Situated Collaboration
Bohus, Dan, Andrist, Sean, Bao, Yuwei, Horvitz, Eric, Paradiso, Ann
–arXiv.org Artificial Intelligence
To track the performance of emerging models and understand their capabilities, the research community has developed We report initial work towards constructing ecologically valid a variety of benchmarks for video-and embodied-question answering benchmarks to assess the capabilities of large multimodal models [7, 9, 11, 12, 14, 20, 22]. These benchmarks are typically for engaging in situated collaboration. In contrast to existing constructed by identifying a preexisting multimodal dataset (or benchmarks, in which question-answer pairs are generated post creating a synthetic one via a virtual environment), and then generating hoc over preexisting or synthetic datasets via templates, human question-answer pairs from templates, human annotators, or annotators, or large language models (LLMs), we propose and investigate via LLMs. The questions are designed to probe model capabilities an interactive system-driven approach, where the questions along various dimensions, such as spatial understanding, episodic are generated by users in context, during their interactions with memory, and the recognition of objects and their attributes. While an end-to-end situated AI system. We illustrate how the questions these benchmarks provide useful probes for model competency, we that arise are different in form and content from questions typically argue that they do not accurately capture the types of questions found in existing embodied question answering (EQA) benchmarks users ask when engaged in a real-time task, and thus do not reflect and discuss new real-world challenge problems brought to the fore.
arXiv.org Artificial Intelligence
Aug-30-2024
- Country:
- Asia
- North America
- Costa Rica > San José Province
- San José (0.05)
- United States
- Michigan > Washtenaw County
- Ann Arbor (0.04)
- New York > New York County
- New York City (0.04)
- Pennsylvania > Allegheny County
- Pittsburgh (0.04)
- Michigan > Washtenaw County
- Costa Rica > San José Province
- Oceania > Australia
- New South Wales > Sydney (0.04)
- Genre:
- Research Report (0.51)
- Technology: