Details

Neural Information Processing Systems 

To keep experiments uniform, for all datasets (STL-10, CIFAR-10, and CIFAR-100) we used a train/val/test partitioning. In our experiments we compared FED with four baselines. For all baselines we tried different learning rates [0.1, 0.01, 0.001] and batch sizes [32, 64, 100]. For EnDD and EnDD + AUX, we used the same temperature, temperature annealing, and optimizer that was used in the original paper. For AMT, we tried different alphas [1e1, 1e3, 1e5] and kept the rest as the original paper.

Similar Docs  Excel Report  more

TitleSimilaritySource
None found