Transformer in Transformer Supplemental Material

Neural Information Processing Systems 

We can see that for both DeiT -S and TNT -S, more patches are related as layer goes deeper. MLP to calculate the attention values. The attention is multiplied to all the embeddings. We extract the features from different layers of TNT to construct multi-scale features. The COCO2017 val results are shown in Table 2. TNT achieves much better Table 2: Results of Faster RCNN object detection on COCO minival set with ImageNet pre-training.

Similar Docs  Excel Report  more

TitleSimilaritySource
None found