Reviews: A flexible model for training action localization with varying levels of supervision
–Neural Information Processing Systems
Paper Summary: The paper describes a method for spatio-temporal human action localization in temporally untrimmed videos based on discriminative clustering [3, 47]. The main contribution of this paper is a new action detection approach which is flexible in the sense that it can be trained with various levels and amounts of supervision. For example, the model can be trained with very weak level of supervision, i.e., train the model for action detection only using ground truth video-level action labels; and also it can be trained with full supervision i.e. with dense per frame bounding box and their class labels. Experimental results demonstrate the strengths and weaknesses for a wide range of supervisory signals such as, video level action labels, single temporal point, one GT bounding box, temporal bounds etc. The method is experimentally evaluated on the UCF-101-24 and DALY action detection datasets.
Neural Information Processing Systems
Oct-7-2024, 10:27:52 GMT
- Technology: