A Appendix

Neural Information Processing Systems 

A.2.5 Loss Function and Optimizer For all the experiments except N-MNIST in this work, we use cross entropy and stochastic gradient