Scattering Vision Transformer: Spectral Mixing Matters
–Neural Information Processing Systems
Vision transformers have gained significant attention and achieved state-of-the-art performance in various computer vision tasks, including image classification, instance segmentation, and object detection. However, challenges remain in addressing attention complexity and effectively capturing fine-grained information within images. Existing solutions often resort to down-sampling operations, such as pooling, to reduce computational cost. Unfortunately, such operations are non-invertible and can result in information loss. In this paper, we present a novel approach called Scattering Vision Transformer (SVT) to tackle these challenges. SVT incorporates a spectrally scattering network that enables the capture of intricate image details.
Neural Information Processing Systems
Jan-19-2025, 18:35:22 GMT
- Technology:
- Information Technology > Artificial Intelligence > Vision (1.00)