gfnet
- Asia > China > Beijing > Beijing (0.04)
- South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
Global Filter Networks for Image Classification
Recent advances in self-attention and pure multi-layer perceptrons (MLP) models for vision have shown great potential in achieving promising performance with fewer inductive biases. These models are generally based on learning interaction among spatial locations from raw data. The complexity of self-attention and MLP grows quadratically as the image size increases, which makes these models hard to scale up when high-resolution features are required. In this paper, we present the Global Filter Network (GFNet), a conceptually simple yet computationally efficient architecture, that learns long-term spatial dependencies in the frequency domain with log-linear complexity. Our architecture replaces the self-attention layer in vision transformers with three key operations: a 2D discrete Fourier transform, an element-wise multiplication between frequency-domain features and learnable global filters, and a 2D inverse Fourier transform.
Using Bounding Boxes
We agree that it is a nice idea to exploit the bounding boxes of ImageNet, and are happy to explore it in GFNet. We will add more comparisons on this point in our revision. The iterative process is indeed not indispensable, but in experiments, it improves the accuracy (e.g., We will make these clear in revision. We will release all the code and pre-trained models upon the acceptance of this paper. Table 1: Results using 32x32 (left) and 64x64 (right) patches.
- Information Technology > Graphics (0.65)
- Information Technology > Artificial Intelligence > Machine Learning (0.32)
- Asia > China > Beijing > Beijing (0.04)
- South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
- Information Technology > Artificial Intelligence > Vision (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
- Information Technology > Sensing and Signal Processing > Image Processing (0.95)
- Information Technology > Artificial Intelligence > Natural Language (0.68)