Supplementary Materials for MLP-Mixer: An all-MLP Architecture for Vision
–Neural Information Processing Systems
We did not observe any noticeable improvements. In other words, token-mixing MLPs operate by looking at only one channel at once. All layers in Mixer retain the same, isotropic design. Table 1: Hyperparameter settings used for pre-training Mixer models. However, these did not lead to consistent improvements, so we dropped them.
Neural Information Processing Systems
Nov-15-2025, 16:53:57 GMT