Residual Networks Behave Like Ensembles of Relatively Shallow Networks

Neural Information Processing Systems 

For example, most of the gradient in a residual network with 110 layers comes from paths that are only 10-34 layers deep.

Similar Docs  Excel Report  more

TitleSimilaritySource
None found