Multimodal Datasets and Benchmarks for Reasoning about Dynamic Spatio-Temporality in Everyday Environments

Ugai, Takanori, Hara, Kensho, Egami, Shusaku, Fukuda, Ken

Sep-16-2024–arXiv.org Artificial Intelligence

We used a 3D simulator to create artificial video data with standardized annotations, aiming to aid in the development of Embodied AI. Our question answering (QA) dataset measures the extent to which a robot can understand human behavior and the environment in a home setting. Preliminary experiments suggest our dataset is useful in measuring AI's comprehension of daily life. \end{abstract}

annotation, dynamic spatio-temporality, multimodal dataset and benchmark, (10 more...)

arXiv.org Artificial Intelligence

Sep-16-2024

arXiv.org PDF

Add feedback

Country:
- North America
  - United States > Washington
    - King County > Seattle (0.04)
  - Puerto Rico > Peñuelas
    - Peñuelas (0.04)
- Asia > Japan
  - Honshū > Kantō
    - Tokyo Metropolis Prefecture > Tokyo (0.04)
    - Kanagawa Prefecture (0.04)

Genre:
- Research Report (1.00)

Technology:
- Information Technology > Artificial Intelligence
  - Vision (1.00)
  - Machine Learning > Neural Networks (0.48)
  - Natural Language > Large Language Model (0.47)