6a30e32e56fce5cf381895dfe6ca7b6f-Supplemental.pdf

Neural Information Processing Systems 

A.1 VectorTransformer: Pairwisevectorattention Here, we summarize the details of the Vector Transformer used in the Bottleneck Transformer experiments. Inparticular,wefollowtheformulationof[ 5],whichenables a vector-version of the Transformer, although it is also possible to incorporate other attention mechanisms. Instead of computing the dot product betweenQandK asQKT to generate the attention'matrix', this vector formulation computes an attention'tensor' {γ(fq(zi) fk(zj))}(i,j) preserving thechannel information. Table 1: TokenLearner compared against pooling-basedtokenreduction. The training was done for 100k iterations with the batch sizeof4perTPUcore(i.e.,4*64=256wasourbatchsize)intheCharadesexperiments.

Similar Docs  Excel Report  more

TitleSimilaritySource
None found