Goto

Collaborating Authors

 agreementece


0b7f639ef28a9035a71f7e0c04c1d681-Supplemental-Conference.pdf

Neural Information Processing Systems

ForDM, due to high memory requirements, we were able to go up to aBatchEnsemble with an ensemble size of 8, while being able to use only batch size of 32. In addition, for this baseline we used a bigger memory GPU, unable tofitthetraining toourstandard 11GBGPU usedfortherestofour experiments. In the procedure of creating a Mixup [8] auxiliary dataset, we used a Beta distribution withα = 0.2. In Mixup augmentation, and valueλ [0,1] is sampled from a Beta distribution. We use batch size of 64.


FunctionalEnsembleDistillation

Neural Information Processing Systems

One popular approach to alleviate this problem is using a Monte-Carlo estimation with an ensemble of models sampled from the posterior. However, this approach still comes at a significant computational cost, as one needs to store and run multiple models at testtime.