Unsupervised Semantic Action Discovery from Video Collections

Sener, Ozan, Zamir, Amir Roshan, Wu, Chenxia, Savarese, Silvio, Saxena, Ashutosh

May-11-2016–arXiv.org Machine Learning

Human communication takes many forms, including speech, text and instructional videos. It typically has an underlying structure, with a starting point, ending, and certain objective steps between them. In this paper, we consider instructional videos where there are tens of millions of them on the Internet. We propose a method for parsing a video into such semantic steps in an unsupervised way. Our method is capable of providing a semantic "storyline" of the video composed of its objective steps. We accomplish this using both visual and language cues in a joint generative model. Our method can also provide a textual description for each of the identified semantic steps and video segments. We evaluate our method on a large number of complex YouTube videos and show that our method discovers semantically correct instructions for a variety of tasks.

artificial intelligence, text processing, video, (19 more...)

arXiv.org Machine Learning

May-11-2016

arXiv.org PDF

Add feedback

Country:
- North America > United States > California > Santa Clara County (0.28)

Genre:
- Instructional Material > Course Syllabus & Notes (0.68)
- Research Report > New Finding (1.00)

Industry:
- Education > Educational Technology (0.95)

Technology:
- Information Technology
  - Artificial Intelligence
    - Machine Learning
      - Learning Graphical Models > Undirected Networks
        Markov Models (0.47)
      - Statistical Learning > Clustering (0.93)
    - Natural Language
      - Grammars & Parsing (0.68)
      - Text Processing (0.68)
    - Representation & Reasoning (1.00)
    - Vision (1.00)
  - Communications (1.00)
  - Data Science > Data Mining (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found