Masked Image Modeling Supplementary Material Anonymous Author(s) Affiliation Address email 1 More Training Details 1

Neural Information Processing Systems 

We use the same setting for different sizes RevCol models on MIM pre-training. The hyper-parameters generally follow [4, 2]. Table 3 shows the detail training settings after MIM pre-training. We also show training settings on ImageNet-1K after ImageNet-22K fine-tuning. For semantic segmentation, we evaluate different backbones on ADE20K dataset.