Learning to Reason with Relational Video Representation for Question Answering

Le, Thao Minh, Le, Vuong, Venkatesh, Svetha, Tran, Truyen

Jul-10-2019–arXiv.org Artificial Intelligence

While acquiring visual knowledge of objects and relations from static images has advanced hugely in recent years [7], How does machine learn to reason about the content of a deep video understanding remains elusive. Compared to video in answering a question? A Video QA system must simultaneously static images, video poses new challenges, primarily due understand language, represent visual content to the inherent dynamic nature of visual content over time over space-time, and iteratively transform these representations [6, 34]. At the lowest level, we have correlated motion in response to lingual content in the query, and finally and appearance [6]. At a higher level, we have objects that arriving at a sensible answer. While recent advances in are persistent over time, actions that are local in time, and textual and visual question answering have come up with the relations that can span over an extended length. Thus sophisticated visual representation and neural reasoning searching for an answer from a video facilitates solving mechanisms, major challenges in Video QA remain on dynamic simultaneous sub-tasks in both the visual and lingual spaces, grounding of concepts, relations and actions to support probably in an iterative and compositional fashion.

machine learning, natural language, question answering, (22 more...)

arXiv.org Artificial Intelligence

Jul-10-2019

arXiv.org PDF

Add feedback

Genre:
- Research Report (0.64)

Technology:
- Information Technology > Artificial Intelligence
  - Vision (1.00)
  - Representation & Reasoning (1.00)
  - Cognitive Science > Problem Solving (1.00)
  - Natural Language > Question Answering (0.72)
  - Machine Learning > Neural Networks
    - Deep Learning (0.49)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found