Focal Attention for Long-Range Interactions in Vision Transformers

Dec-25-2025, 07:45:40 GMT–Neural Information Processing Systems

Recently, Vision Transformer and its variants have shown great promise on various computer vision tasks. The ability to capture local and global visual dependencies through self-attention is the key to its success. But it also brings challenges due to quadratic computational overhead, especially for the high-resolution vision tasks(e.g., object detection). Many recent works have attempted to reduce the cost and improve model performance by applying either coarse-grained global attention or fine-grained local attention.

long-range interaction, transformer, vision transformer, (9 more...)

Neural Information Processing Systems

Dec-25-2025, 07:45:40 GMT

Conferences Web Page

Add feedback

Technology:
- Information Technology > Artificial Intelligence > Vision (1.00)