Goto

Collaborating Authors

 crazy-jack nips2023


Supplementary for Emergence of Shape Bias in Convolutional Neural Networks through Activation Sparsity 1 Further Results of the impact of sparsity on Shape Bias Benchmark

Neural Information Processing Systems

We utilize the sparsity operation proposed in Section 3.1 for ResNet-50. For ViT, we also apply the spatial Top-K operation as described in the general response. We can observe an increase in both ResNet-50 and ViT-B architectures, furthering closing the gap between human and existing models. We generalize section 4.2 in the main text to ResNet-50 and ViT-B architectures (Figure 1). The ResNet-50's sparsity definition is the same as AlexNet and VGG. For ViT-B, we reshape the intermediate activation response from [n, h * w, d] to [n, d, h * w] and apply the Top-K selection over dimension 2 before the activation is passed through the multiple head attention (Note that h and w is the height and weight of the latent tensor after reshape it to 2d, for ViT-B with patch size 16 on the 224x224 images, h=w=14, n denotes the batch size).


SupplementaryforEmergenceofShapeBiasin ConvolutionalNeuralNetworksthroughActivation Sparsity 1 FurtherResultsoftheimpactofsparsityonShapeBiasBenchmark

Neural Information Processing Systems

We utilize the sparsity operation proposed in Section 3.1 for ResNet-50. We generalize section 4.2 in the main text to ResNet-50 and ViT-B architectures (Figure 1). We apply the Sparsity layer in a subset of the network. It is based on the intuition that the brain utilizes sparsity for long range communication butcan allowlocal dense computation. Wedivide thenetworks into chunks where within each chunk theneuron'sactivities areallowed tobedense (keep original) but the communication across different chunks is set to be sparse.