6a30e32e56fce5cf381895dfe6ca7b6f-Supplemental.pdf

Feb-9-2026, 05:02:26 GMT–Neural Information Processing Systems

A.1 VectorTransformer: Pairwisevectorattention Here, we summarize the details of the Vector Transformer used in the Bottleneck Transformer experiments. Inparticular,wefollowtheformulationof[ 5],whichenables a vector-version of the Transformer, although it is also possible to incorporate other attention mechanisms. Instead of computing the dot product betweenQandK asQKT to generate the attention'matrix', this vector formulation computes an attention'tensor' {γ(fq(zi) fk(zj))}(i,j) preserving thechannel information. Table 1: TokenLearner compared against pooling-basedtokenreduction. The training was done for 100k iterations with the batch sizeof4perTPUcore(i.e.,4*64=256wasourbatchsize)intheCharadesexperiments.

arxivpreprintarxiv, transformer experiment, vectortransformer, (10 more...)

Neural Information Processing Systems

Feb-9-2026, 05:02:26 GMT

Conferences PDF

Add feedback

Technology:
- Information Technology (0.33)

Duplicate Docs Excel Report

Title
TokenLearner: Adaptive Space-Time T okenization for Videos - Supplementary Materials - Michael S. Ryoo

Similar Docs Excel Report more

Title	Similarity	Source
None found