Variance-Preserving Initialization Schemes Improve Deep Network Training: But Which Variance is Preserved?

Open in new window