Reviews: Swapout: Learning an ensemble of deep architectures

Jan-20-2025, 19:34:26 GMT–Neural Information Processing Systems

Why can't you estimate the test time statistics empirically on a validation set? I really appreciate the tidbits on why dropout and swapout interact poorly with batch normalization. It's useful to know that you don't have to average over very many sampled dropouts (swapouts). I think this is a neat additional analysis and rather useful to the community. Why do the authors first do exactly 196 and then 224 epochs before decaying the learning rate? Normally such specific choices would arouse suspicion except in this case I expect it doesn't make much difference (e.g. between 196 and a round number like 200).

deep architecture, ensemble, swapout, (4 more...)

Neural Information Processing Systems

Jan-20-2025, 19:34:26 GMT

Conferences Web Page

Add feedback

Technology:
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.40)