Learning to Reason with Relational Video Representation for Question Answering

Le, Thao Minh, Le, Vuong, Venkatesh, Svetha, Tran, Truyen

arXiv.org Artificial Intelligence 

While acquiring visual knowledge of objects and relations from static images has advanced hugely in recent years [7], How does machine learn to reason about the content of a deep video understanding remains elusive. Compared to video in answering a question? A Video QA system must simultaneously static images, video poses new challenges, primarily due understand language, represent visual content to the inherent dynamic nature of visual content over time over space-time, and iteratively transform these representations [6, 34]. At the lowest level, we have correlated motion in response to lingual content in the query, and finally and appearance [6]. At a higher level, we have objects that arriving at a sensible answer. While recent advances in are persistent over time, actions that are local in time, and textual and visual question answering have come up with the relations that can span over an extended length. Thus sophisticated visual representation and neural reasoning searching for an answer from a video facilitates solving mechanisms, major challenges in Video QA remain on dynamic simultaneous sub-tasks in both the visual and lingual spaces, grounding of concepts, relations and actions to support probably in an iterative and compositional fashion.

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found