Width and Depth Limits Commute in Residual Networks
–arXiv.org Artificial Intelligence
We show that taking the width and depth to infinity in a deep neural network with skip connections, when branches are scaled by $1/\sqrt{depth}$ (the only nontrivial scaling), result in the same covariance structure no matter how that limit is taken. This explains why the standard infinite-width-then-depth approach provides practical insights even for networks with depth of the same order as width. We also demonstrate that the pre-activations, in this case, have Gaussian distributions which has direct applications in Bayesian deep learning. We conduct extensive simulations that show an excellent match with our theoretical findings.
arXiv.org Artificial Intelligence
Aug-10-2023
- Country:
- Asia > Singapore (0.04)
- North America > United States
- Hawaii > Honolulu County > Honolulu (0.04)
- Genre:
- Research Report > New Finding (0.93)
- Technology: