RelationalSelf-Attention: What'sMissinginAttentionforVideoUnderstanding SupplementaryMaterial

Feb-8-2026, 09:47:54 GMT–Neural Information Processing Systems

Forthebottlenecks including RSAlayers, werandomly initializeweights using MSRA initialization [3] and set the gamma parameter of the last batch normalization layer to zero. We implement our model based on TSN in Pytorch2 under BSD 2-Clause license. All the benchmarks that we used are commonly used datasets for the academic purpose. While specified otherwise, the training and testing details are the sameasthoseinSec.5.1. Since each RSA kernel generated by each query captures a distinct motion pattern, the model can learn diverse motion features(seeFigure3). Inthisexperiment,wechooseL = 8asthedefault.

artificial intelligence, einsum, kernel, (17 more...)

Neural Information Processing Systems

Feb-8-2026, 09:47:54 GMT

Conferences PDF

Add feedback

Technology:
- Information Technology > Artificial Intelligence (1.00)

Duplicate Docs Excel Report

Title
Relational Self-Attention: What's Missing in Attention for Video Understanding Supplementary Material

Similar Docs Excel Report more

Title	Similarity	Source
None found