Revisiting the Integration of Convolution and Attention for Vision Backbone

Neural Information Processing Systems 

Convolutions (Convs) and multi-head self-attentions (MHSAs) are typically considered alternatives to each other for building vision backbones. Although some works try to integrate both, they apply the two operators simultaneously at the finest pixel granularity.

Similar Docs  Excel Report  more

TitleSimilaritySource
None found