Reviews: DropMax: Adaptive Variational Softmax

Neural Information Processing Systems 

This paper proposes doing dropout in the output softmax layer during supervised training of neural net classifiers. The dropout probabilities are adapted per example. The probabilities are computed as a function of the penultimate layer of the classifier. So that layer is used to compute both: the logits, and the gating for those logits. This model combines ideas from adaptive dropout (Ba and Frey NIPS'13) and variational dropout (Kingma et al). The key problem being solved is how to do inference to get the optimal dropout probabilities.