Appendix
–Neural Information Processing Systems
"PE-cls PE-det"referstoperforming2D interpolation ofImageNet-1k pre-trained PE-cls toPE-det for object detection. The PEs added in the intermediate (Mid.) Weconcludethat: For a given YOLOS model, different self-attention heads focus on different patterns & differentlocations. We study the attention map differences of two YOLOS models,i.e., the 200 epochs ImageNet-1k [4] pre-trained YOLOS-S and the300 epochs ImageNet-1k pre-trained YOLOS-S. Note that the AP of these two models is the same (AP= 36.1).
Neural Information Processing Systems
Feb-11-2026, 11:37:20 GMT
- Technology:
- Information Technology > Artificial Intelligence > Vision (0.37)