Supplementary for Emergence of Shape Bias in Convolutional Neural Networks through Activation Sparsity

May-25-2025, 14:37:13 GMT–Neural Information Processing Systems

We utilize the sparsity operation proposed in Section 3.1 for ResNet-50. For ViT, we also apply the spatial Top-K operation as described in the general response. We can observe an increase in both ResNet-50 and ViT-B architectures, furthering closing the gap between human and existing models. We generalize section 4.2 in the main text to ResNet-50 and ViT-B architectures (Figure 1). The ResNet-50's sparsity definition is the same as AlexNet and VGG. For ViT-B, we reshape the intermediate activation response from [n, h * w, d] to [n, d, h * w] and apply the Top-K selection over dimension 2 before the activation is passed through the multiple head attention (Note that h and w is the height and weight of the latent tensor after reshape it to 2d, for ViT-B with patch size 16 on the 224x224 images, h=w=14, n denotes the batch size).

artificial intelligence, machine learning, sparsity, (17 more...)

Neural Information Processing Systems

May-25-2025, 14:37:13 GMT

Conferences PDF

Add feedback

Country:
- Europe > Switzerland > Zürich > Zürich (0.14)

Technology:
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.41)