CTRN: Class-Temporal Relational Network for Action Detection
Dai, Rui, Das, Srijan, Bremond, Francois
–arXiv.org Artificial Intelligence
Action detection is an essential and challenging task, especially for densely labelled datasets of untrimmed videos. There are many real-world challenges in those datasets, such as composite action, co-occurring action, and high temporal variation of instance duration. For handling these challenges, we propose to explore both the class and temporal relations of detected actions. In this work, we introduce an end-to-end network: Class-Temporal Relational Network (CTRN). It contains three key components: (1) The Representation Transform Module filters the class-specific features from the mixed representations to build a graph structured data. We evaluate CTRN on three challenging densely labelled datasets and achieve state-of-the-art performance, reflecting the effectiveness and robustness of our method. Action detection is a challenging computer vision problem which targets at finding precise temporal boundaries of actions occurring in an untrimmed video. For instance, action detection algorithms on popular datasets like THUMOS Jiang et al. (2014) and ActivityNet Caba Heilbron et al. (2015) generally learn representations for single actions in a video.
arXiv.org Artificial Intelligence
Oct-26-2021