RelationalSelf-Attention: What'sMissinginAttentionforVideoUnderstanding SupplementaryMaterial
–Neural Information Processing Systems
Forthebottlenecks including RSAlayers, werandomly initializeweights using MSRA initialization [3] and set the gamma parameter of the last batch normalization layer to zero. We implement our model based on TSN in Pytorch2 under BSD 2-Clause license. All the benchmarks that we used are commonly used datasets for the academic purpose. While specified otherwise, the training and testing details are the sameasthoseinSec.5.1. Since each RSA kernel generated by each query captures a distinct motion pattern, the model can learn diverse motion features(seeFigure3). Inthisexperiment,wechooseL = 8asthedefault.
Neural Information Processing Systems
Feb-8-2026, 09:47:54 GMT
- Technology: