TransformerinTransformer SupplementalMaterial
–Neural Information Processing Systems
This isbecause the information between patches has been fully communicated with each other in deeperlayers. MLP to calculate the attention values. The attention is multiplied to all the embeddings. The SE module only brings in afew extra parameters but is able to perform dimension-wise attention for featureenhancement. In particular,FPN takes 4 levels of features (14, 18, 116, 132)as input, while the resolution of feature ofeveryTNTblockis 116.
Neural Information Processing Systems
Feb-19-2026, 05:46:01 GMT