SupplementarymaterialforSpace-timeMixing AttentionforVideoTransformer

Neural Information Processing Systems 

Instead we propose two new forms of aggregation: Temporal Attention aggregation and Summary Token. Is space-time attention all you need for video understanding? More is less: Learning efficient video representations bybig-little network and depthwise temporal aggregation.arXiv

Similar Docs  Excel Report  more

TitleSimilaritySource
None found