Bridging the Gap Between Vision Transformers and Convolutional Neural Networks on Small Datasets
–Neural Information Processing Systems
ViTs to attend the spatial relevance. Second, on channel aspect, representation exhibits diversity on different channels. But the scarce data can not enable ViTs to learn strong enough representation for accurate recognition.
Neural Information Processing Systems
Nov-14-2025, 11:47:49 GMT