ff1418e8cc993fe8abcfe3ce2003e5c5-Supplemental.pdf
–Neural Information Processing Systems
The table ( right) shows 100 epoch results using best lr and wd values found at 50 epochs. ViT's patchify stem differs from the proposed convolutional stem in the type of convolution used and We investigate these factors next. The focus of this paper is studying the large, positive impact of changing ViT's default We use AdamW for all experiments. Figure 7 shows the results. The table ( right) shows 100 epoch results using optimal lr and wd values chosen from the 50 epoch runs.
Neural Information Processing Systems
Aug-19-2025, 02:50:45 GMT
- Technology: