agreementece
0b7f639ef28a9035a71f7e0c04c1d681-Supplemental-Conference.pdf
ForDM, due to high memory requirements, we were able to go up to aBatchEnsemble with an ensemble size of 8, while being able to use only batch size of 32. In addition, for this baseline we used a bigger memory GPU, unable tofitthetraining toourstandard 11GBGPU usedfortherestofour experiments. In the procedure of creating a Mixup [8] auxiliary dataset, we used a Beta distribution withα = 0.2. In Mixup augmentation, and valueλ [0,1] is sampled from a Beta distribution. We use batch size of 64.
Technology:
- Information Technology > Hardware (0.56)
- Information Technology > Artificial Intelligence > Machine Learning (0.47)
Technology: