0b7f639ef28a9035a71f7e0c04c1d681-Supplemental-Conference.pdf

Neural Information Processing Systems 

ForDM, due to high memory requirements, we were able to go up to aBatchEnsemble with an ensemble size of 8, while being able to use only batch size of 32. In addition, for this baseline we used a bigger memory GPU, unable tofitthetraining toourstandard 11GBGPU usedfortherestofour experiments. In the procedure of creating a Mixup [8] auxiliary dataset, we used a Beta distribution withα = 0.2. In Mixup augmentation, and valueλ [0,1] is sampled from a Beta distribution. We use batch size of 64.

Similar Docs  Excel Report  more

TitleSimilaritySource
None found