Alignment-guided Temporal Attention for Video Action Recognition
–Neural Information Processing Systems
Temporal modeling is crucial for various video learning tasks. Most recent approaches employ either factorized (2D 1D) or joint (3D) spatial-temporal operations to extract temporal contexts from the input frames. While the former is more efficient in computation, the latter often obtains better performance. In this paper, we attribute this to a dilemma between the sufficiency and the efficiency of interactions among various positions in different frames. These interactions affect the extraction of task-relevant information shared among frames.
alignment-guided temporal attention, task-relevant information, video action recognition, (1 more...)
Neural Information Processing Systems
Oct-11-2024, 04:07:55 GMT
- Technology:
- Information Technology > Artificial Intelligence
- Machine Learning (0.44)
- Vision (0.45)
- Information Technology > Artificial Intelligence