SupplementarymaterialforSpace-timeMixing AttentionforVideoTransformer

Feb-10-2026, 09:59:48 GMT–Neural Information Processing Systems

Instead we propose two new forms of aggregation: Temporal Attention aggregation and Summary Token. Is space-time attention all you need for video understanding? More is less: Learning efficient video representations bybig-little network and depthwise temporal aggregation.arXiv

aggregation, attentionforvideotransformer, temporal aggregation, (5 more...)

Neural Information Processing Systems

Feb-10-2026, 09:59:48 GMT

Conferences PDF

Add feedback

Duplicate Docs Excel Report

Title
Supplementary material for Space-time Mixing Attention for Video Transformer

Similar Docs Excel Report more

Title	Similarity	Source
None found