Appendices A Proofs

Neural Information Processing Systems 

This part contains the proofs of Lemma 3.1 and Theorem 3.2. We also restate Lemma 3.1 and Theorem 3.2 using the new notations here, so that this part can be Under the new notations here, Equation (2) in Section 3.2 becomes: f (x) = tnull xnull By Lemma A.2, the optimal solution of (7) can be found on the vertices of Now we consider the discreteness constraint. The detailed parameters in the training and pruning stages of our method are listed in Table 4. The main building block of ResNet-50 is the bottleneck block [He et al., 2016], as shown in Figure 5. " of a bottleneck block are already 0 or very close to 0. So we do not apply any extra We visualize the layer-wise distribution of scaling factors in Figure 6. Figure 6 compares the layer-wise distributions of scaling factors between the baseline ResNet-50 model and the model trained with our polarization regularizer on ImageNet dataset. Figure 6: Comparison of the layer-wise scaling factor distributions between baseline ResNet-50 model and the model trained with our polarization regularizer on ImageNet dataset.

Similar Docs  Excel Report  more

TitleSimilaritySource
None found