Bridging the Gap Between Vision Transformers and Convolutional Neural Networks on Small Datasets-Supplementary Materials

May-30-2025, 07:27:45 GMT–Neural Information Processing Systems

Code is modified from https://github.com/coeusguo/ceit Module): def __init__ (self, dim, num_heads =8): super (). All the models are pre-trained on ImageNet-1K [1] only and then fine-tuned on CIFAR-100 [2] datasets. Results are shown in Table 1. We cite the reported results from corresponding papers. When fine-tuning our DHVT, we use AdamW optimizer with cosine learning rate scheduler and 2 warm-up epochs, a batch size of 256, an initial learning rate of 0.0005, weight decay of 1e-8, and fine-tuning epochs of 100.

artificial intelligence, head token, machine learning, (18 more...)

Neural Information Processing Systems

May-30-2025, 07:27:45 GMT

Conferences PDF

Add feedback

Country:
- Asia > China (0.14)
- North America > Canada
  - Ontario > Toronto (0.14)

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning > Neural Networks
    - Deep Learning (0.50)
  - Vision (1.00)