Spatiotemporal Deformable Scene Graphs for Complex Activity Detection

Khan, Salman, Cuzzolin, Fabio

arXiv.org Artificial Intelligence 

Complex activity recognition is attracting much attention in the computer vision research community due to its significance for a variety of real-world applications, such as autonomous driving [6, 7], surveillance [28], medical robotics [60] or team sports analysis [21]. In autonomous driving, for instance, it is extremely important that the vehicle understands dynamic road scenes, in order, e.g., to accurately predict the intention of pedestrians and forecast their trajectories to inform appropriate decisions. In surveillance, group activities rather than actions performed by individuals need to be monitored. Robotic assistant surgeons need to understand what the main surgeon is doing throughout a complex surgical procedure composed by many short-term actions and events [43], in order to suitably assist them. Recent methods for action or activity recognition and localisation can be broadly divided into two categories; single atomic action [19, 30, 36, 54] and multiple atomic action recognition/localisation [22, 25, 31, 45, 51, 57]. The former methods only focus on identifying the start and end of an action performed in a short video portraying a single instance, leveraging datasets such as UCF-101 [44] or Charades [38]. The latter set of approaches consider videos which contain a number of atomic actions or multiple repetitions of the same action. Methods in this category do address complex activity recognition, as their aim is to understand an overall, dynamic scene by detecting and identifying its constituent components.

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found