Alignment-guided Temporal Attention for Video Action Recognition

Oct-11-2024, 04:07:55 GMT–Neural Information Processing Systems

Temporal modeling is crucial for various video learning tasks. Most recent approaches employ either factorized (2D 1D) or joint (3D) spatial-temporal operations to extract temporal contexts from the input frames. While the former is more efficient in computation, the latter often obtains better performance. In this paper, we attribute this to a dilemma between the sufficiency and the efficiency of interactions among various positions in different frames. These interactions affect the extraction of task-relevant information shared among frames.

alignment-guided temporal attention, task-relevant information, video action recognition, (1 more...)

Neural Information Processing Systems

Oct-11-2024, 04:07:55 GMT

Conferences Web Page

Add feedback

Technology:
- Information Technology > Artificial Intelligence
  - Vision (0.45)
  - Machine Learning (0.44)