DAC-DETR: Divide the Attention Layers and Conquer

May-1-2026, 05:05:51 GMT–Neural Information Processing Systems

This paper reveals a characteristic of DEtection Transformer (DETR) that negatively impacts its training efficacy, i.e., the cross-attention and self-attention layers in DETR decoder have opposing impacts on the object queries (though both impacts are important). Specifically, we observe the cross-attention tends to gather multiple queries around the same object, while the self-attention disperses these queries far away. To improve the training efficacy, we propose a Divide-And-Conquer DETR (DAC-DETR) that separates out the cross-attention to avoid these competing objectives. During training, DAC-DETR employs an auxiliary decoder that focuses on learning the cross-attention layers. The auxiliary decoder, while sharing all the other parameters, has NO self-attention layers and employs one-to-many label assignment to improve the gathering effect. Experiments show that DAC-DETR brings remarkable improvement over popular DETRs. For example, under the 12 epochs training scheme on MS-COCO, DAC-DETR improves Deformable DETR (ResNet50) by +3.4AP and achieves 50.9 (ResNet-50) / 58.1 AP (Swin-Large) based on some popular methods (i.e., DINO and an IoU-related loss).

artificial intelligence, machine learning, query, (16 more...)

Neural Information Processing Systems

May-1-2026, 05:05:51 GMT

Conferences PDF

Add feedback

Country:
- Europe > Switzerland (0.30)

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning (1.00)
  - Vision (0.72)

Duplicate Docs Excel Report

Title
edd0d433f8a1a51aa11237a6543fc280-Paper-Conference.pdf

Similar Docs Excel Report more

Title	Similarity	Source
None found