Appendix
–Neural Information Processing Systems
In object detection and many other computer vision benchmarks, the image resolutions as well as the aspect ratios are usually not fixed as the image classification task. For the first layer, the PE is interpolated following ViT. In a word, Type-I uses more PEs and Type-II uses larger PE. In our paper, small-and base-sized models use this setting. The detailed configurations are given in Tab. 1. PE-cls to PE-det Rand.
Neural Information Processing Systems
Nov-15-2025, 20:37:32 GMT
- Technology:
- Information Technology > Artificial Intelligence > Vision (1.00)