BatchNormalizationOrthogonalizesRepresentations inDeepRandomNetworks
–Neural Information Processing Systems
More precisely, under a mild assumption, we prove that the deviation of the representations from orthogonality rapidly decays with depth up to a term inversely proportional to the network width. This result has two main implications: 1) Theoretically, as the depth grows, the distribution of the representation -after the linear layers-contracts to a Wasserstein-2 ball around an isotropic Gaussian distribution.
Neural Information Processing Systems
Feb-7-2026, 22:44:01 GMT
- Country:
- Europe > Latvia > Lubāna Municipality > Lubāna (0.05)
- Genre:
- Research Report (0.47)
- Technology: