DAC-DETR: Divide the Attention Layers and Conquer

Jan-20-2025, 01:53:59 GMT–Neural Information Processing Systems

This paper reveals a characteristic of DEtection Transformer (DETR) that negatively impacts its training efficacy, i.e., the cross-attention and self-attention layers in DETR decoder have contrary impacts on the object queries (though both impacts are important). Specifically, we observe the cross-attention tends to gather multiple queries around the same object, while the self-attention disperses these queries far away. To improve the training efficacy, we propose a Divide-And-Conquer DETR (DAC-DETR) that divides the cross-attention out from this contrary for better conquering. During training, DAC-DETR employs an auxiliary decoder that focuses on learning the cross-attention layers. The auxiliary decoder, while sharing all the other parameters, has NO self-attention layers and employs one-to-many label assignment to improve the gathering effect.

attention layer and conquer, auxiliary decoder, dac-detr, (3 more...)

Neural Information Processing Systems

Jan-20-2025, 01:53:59 GMT

Conferences Web Page

Add feedback

Technology:
- Information Technology > Artificial Intelligence (0.42)