Reviews: Sigsoftmax: Reanalysis of the Softmax Bottleneck

Neural Information Processing Systems 

The paper analyzes ability of the soft-max, if used as the output activation function in NN, to approximate posterior distribution. The problem is translated to the study of the rank of the matrices contating the log-probabilities computed by the analyzed activation layer. It is shown that the soft-max does not increases the rank of the input response matrix (i.e. The authors propose to replace soft-max by the so called sigsoftmax (i.e. It is shown that the rank of sigsoftmax matrix is not less the rank of soft-max.