BatchNormalizationOrthogonalizesRepresentations inDeepRandomNetworks

Neural Information Processing Systems 

More precisely, under a mild assumption, we prove that the deviation of the representations from orthogonality rapidly decays with depth up to a term inversely proportional to the network width. This result has two main implications: 1) Theoretically, as the depth grows, the distribution of the representation -after the linear layers-contracts to a Wasserstein-2 ball around an isotropic Gaussian distribution.

Similar Docs  Excel Report  more

TitleSimilaritySource
None found