A Experimental Setup
–Neural Information Processing Systems
A.1 Datasets In Table 3, we provide information about the image size, the number of classes, and the number of training/test samples of the datasets used in our experiment. A.2 Training Settings of Teacher We provide training settings of the teacher w.r.t. Meanings of abbreviations: opt: optimizer, lr: learning rate, wd: weight decay, mo: momentum, bs: batch size, ls: scaling learning rate or not with the base batch size of 256 [15], ld: learning rate decay, ldep: epochs at which learning rate are decayed, ep: total number of epochs, wep: number of warm-up epochs. A.3 Training Settings of MAD In Tables 5,6,7, we provide the training settings of MAD used in this paper. Despite multiple attempts, we could not find a global configuration that works well for all datasets and architectures.
Neural Information Processing Systems
May-29-2025, 13:02:49 GMT
- Technology: