Video Question Answering on Screencast Tutorials

Zhao, Wentian, Kim, Seokhwan, Xu, Ning, Jin, Hailin

Aug-2-2020–arXiv.org Artificial Intelligence

This paper presents a new video question answering task on screencast tutorials. We introduce a dataset including question, answer and context triples from the tutorial videos for a software. Unlike other video question answering works, all the answers in our dataset are grounded to the domain knowledge base. An one-shot recognition algorithm is designed to extract the visual cues, which helps enhance the performance of video question answering. We also propose several baseline neural network architectures based on various aspects of video contexts from the dataset. The experimental results demonstrate that our proposed models significantly improve the question answering performances by incorporating multi-modal contexts and domain knowledge.

machine learning, natural language, question answering, (20 more...)

arXiv.org Artificial Intelligence

Aug-2-2020

arXiv.org PDF

Add feedback

Country:
- Oceania > Australia
  - Victoria > Melbourne (0.04)
- North America > United States
  - Nevada > Clark County
    - Las Vegas (0.04)
  - Louisiana > Orleans Parish
    - New Orleans (0.04)
  - California > Los Angeles County
    - Los Angeles (0.14)
- Europe
  - United Kingdom > Wales
    - Cardiff (0.04)
  - France > Provence-Alpes-Côte d'Azur
    - Alpes-Maritimes > Nice (0.04)
- Asia > South Korea
  - Seoul > Seoul (0.04)

Genre:
- Research Report > New Finding (0.48)

Industry:
- Education > Educational Technology (0.31)

Technology:
- Information Technology > Artificial Intelligence
  - Natural Language > Question Answering (1.00)
  - Machine Learning > Neural Networks (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found