tiny imagenet
ATraining Details
All experiments were performed using a single Tesla V100 GPU. We use these trained networks and treat them as pre-trained models, i.e. we consider the IC-only" setup, where we do not change the base network. For CIFAR-10 and CIFAR-100 we train ICs for 50 epochs using the Adam optimizer with learning rate set to 0.001, but lowered by a factor of 10 after 15 epochs. When training on Tiny ImageNet, the learning rate is additionally lowered again by the same factor after epoch 40. On ImageNet (on the pretrained ResNet-50 from the torchvision package), the ICs are trained for 40epochs, with the initial learning rate of 0.00001 being reduced by a factor of 10 in epochs 20 and 30.