CRIPP-VQA: Counterfactual Reasoning about Implicit Physical Properties via Video Question Answering

Patel, Maitreya, Gokhale, Tejas, Baral, Chitta, Yang, Yezhou

Nov-7-2022–arXiv.org Artificial Intelligence

Videos often capture objects, their visible properties, their motion, and the interactions between different objects. Objects also have physical properties such as mass, which the imaging pipeline is unable to directly capture. However, these properties can be estimated by utilizing cues from relative object motion and the dynamics introduced by collisions. In this paper, we introduce CRIPP-VQA, a new video question answering dataset for reasoning about the implicit physical properties of objects in a scene. CRIPP-VQA contains videos of objects in motion, annotated with questions that involve counterfactual reasoning about the effect of actions, questions about planning in order to reach a goal, and descriptive questions about visible properties of objects. The CRIPP-VQA test set enables evaluation under several out-of-distribution settings -- videos with objects with masses, coefficients of friction, and initial velocities that are not observed in the training distribution. Our experiments reveal a surprising and significant performance gap in terms of answering questions about implicit properties (the focus of this paper) and explicit properties of objects (the focus of prior work).

machine learning, natural language, question answering, (17 more...)

arXiv.org Artificial Intelligence

Nov-7-2022

arXiv.org PDF

Add feedback

Country:
- Asia > China (0.04)
- South America > Chile
  - Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
- North America
  - United States
    - Arizona (0.04)
    - Minnesota > Hennepin County
      - Minneapolis (0.14)
    - Nevada > Clark County
      - Las Vegas (0.04)
    - Massachusetts > Suffolk County
      - Boston (0.04)
    - Hawaii > Honolulu County
      - Honolulu (0.04)
    - Louisiana > Orleans Parish
      - New Orleans (0.04)
    - California > Los Angeles County
      - Long Beach (0.04)
    - Washington > King County
      - Seattle (0.04)
    - New York > New York County
      - New York City (0.04)
  - Canada > British Columbia
    - Metro Vancouver Regional District > Vancouver (0.14)
- Europe
  - Ireland (0.04)
  - Italy > Veneto
    - Venice (0.04)
- Africa > Ethiopia
  - Addis Ababa > Addis Ababa (0.04)

Genre:
- Research Report (1.00)

Technology:
- Information Technology > Artificial Intelligence
  - Vision (1.00)
  - Representation & Reasoning (1.00)
  - Natural Language > Question Answering (1.00)
  - Machine Learning > Neural Networks (0.68)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found