A Appendix
–Neural Information Processing Systems
Memory Cost of Self-attention Weights in DETR: DETR has six encoder-decoder pairs. The memory cost of this tensor during training under different hyper-parameter settings and optimization strategies are plotted in Figure 2. It shows that more attention For pedestrian detection tasks, we normally choose head=8, downsampling ratio=0.25 Thus, deformable DETR is used in our work to save memory resources. Note that the original image size is 1024x2048. The detection head is the same as CSP . Training: For anchor-free methods, the same ground truth and loss functions as CSP are utilized.
Neural Information Processing Systems
Aug-17-2025, 18:25:17 GMT
- Technology:
- Information Technology > Artificial Intelligence > Vision (0.30)