Convergence of SGD for Training Neural Networks with Sliced Wasserstein Losses