Figure 1 Projecting 50 dimensional obtained by training a simple neural network without SSE Left and

Neural Information Processing Systems 

We thank the reviewers for their insightful feedback. In the following, we address their concerns and questions. It is indeed a great suggestion to examine concrete examples beyond the quantitative evaluation to get an intuition. That is likely due to the use of item graph. As shown in Theorem 1, SSE can'smooth' the Rademacher SSE-SE and perhaps we can further study how this is related to dropout in theory.