Supplementary Materials for MLP-Mixer: An all-MLP Architecture for Vision, Lucas Beyer
–Neural Information Processing Systems
A.1 Modifying the token-mixing MLPs We ablated a number of ideas trying to improve the token-mixing MLPs for Mixer models of various scales pre-trained on JFT-300M. Instead, we could introduce C separate MLPs with independent weights, effectively multiplying the number of parameters by C. We did not observe any noticeable improvements. Grouping the channels together Token-mixing MLPs take S-dimensional vectors as inputs. Every such vector contains values of a single feature across S different spatial locations. In other words, token-mixing MLPs operate by looking at only one channel at once.
Neural Information Processing Systems
Mar-21-2025, 21:02:38 GMT
- Technology: