Transformer in Transformer Supplemental Material
–Neural Information Processing Systems
We can see that for both DeiT -S and TNT -S, more patches are related as layer goes deeper. MLP to calculate the attention values. The attention is multiplied to all the embeddings. We extract the features from different layers of TNT to construct multi-scale features. The COCO2017 val results are shown in Table 2. TNT achieves much better Table 2: Results of Faster RCNN object detection on COCO minival set with ImageNet pre-training.
Neural Information Processing Systems
Nov-14-2025, 21:40:02 GMT
- Technology:
- Information Technology > Artificial Intelligence
- Machine Learning (0.53)
- Vision (0.40)
- Information Technology > Artificial Intelligence