Revisiting the Integration of Convolution and Attention for Vision Backbone

Feb-12-2026, 15:51:09 GMT–Neural Information Processing Systems

Convolutions (Convs) and multi-head self-attentions (MHSAs) are typically considered alternatives to each other for building vision backbones. Although some works try to integrate both, they apply the two operators simultaneously at the finest pixel granularity.

artificial intelligence, machine learning, natural language, (18 more...)

Neural Information Processing Systems

Feb-12-2026, 15:51:09 GMT

Conferences PDF

Add feedback

Country:
- Asia > China > Hong Kong (0.04)

Genre:
- Research Report > Experimental Study (0.93)

Technology:
- Information Technology
  - Sensing and Signal Processing > Image Processing (0.68)
  - Artificial Intelligence
    - Vision (1.00)
    - Representation & Reasoning (1.00)
    - Natural Language (0.68)
    - Machine Learning
      - Statistical Learning (1.00)
      - Neural Networks > Deep Learning (0.46)

Duplicate Docs Excel Report

Title
Revisiting the Integration of Convolution and Attention for Vision Backbone

Similar Docs Excel Report more

Title	Similarity	Source
None found