Relational Self-Attention: What's Missing in Attention for Video Understanding Supplementary Material Manjin Kim
–Neural Information Processing Systems
We use SGD with the momentum of 0.9 and set the batch size as 64 across 8 V100 GPU We use dropout of 0.3 before the final We use dropout of 0.5 before the final classifier. For FineGym [8], we sample a single clip consists of 8 frames for inference. All the benchmarks that we used are commonly used datasets for the academic purpose. As described in Sec.4.2, we For ease description, the notation of multi-query L is omitted. In Figure 2, We provide pseudo-codes of Eq.11 and 12 in Sec.4.2 in our For ease description, the notation of multi-query L is omitted.
Neural Information Processing Systems
Nov-14-2025, 00:18:47 GMT