Visual explanation for video recognition – twentybn – Medium

#artificialintelligence 

This post describes how temporally-sensitive saliency maps can be obtained for deep neural networks designed for video recognition. It is evident from the previous works [2, 3, 4] that saliency maps help visualize why a model produced a given prediction and can uncover artifacts in the data and point towards better model architectures. Task: Recognizing human actions in videos from our recently released dataset requires a fine-grained understanding of concepts like three-dimensional geometry, material properties, object permanence, affordance and gravity [1]. The dataset, dubbed "Something-Something", consists of 100,000 videos across 174 categories containing concepts such as dropping, picking, pushing etc. Grad-CAM or Gradient-weighted Class Activation Mapping, proposed by [4], allows us to obtain a localization map for any target class. Please refer [4] for more details.

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found