Goto

Collaborating Authors

 shape bias


Supplementary for Emergence of Shape Bias in Convolutional Neural Networks through Activation Sparsity 1 Further Results of the impact of sparsity on Shape Bias Benchmark

Neural Information Processing Systems

We utilize the sparsity operation proposed in Section 3.1 for ResNet-50. For ViT, we also apply the spatial Top-K operation as described in the general response. We can observe an increase in both ResNet-50 and ViT-B architectures, furthering closing the gap between human and existing models. We generalize section 4.2 in the main text to ResNet-50 and ViT-B architectures (Figure 1). The ResNet-50's sparsity definition is the same as AlexNet and VGG. For ViT-B, we reshape the intermediate activation response from [n, h * w, d] to [n, d, h * w] and apply the Top-K selection over dimension 2 before the activation is passed through the multiple head attention (Note that h and w is the height and weight of the latent tensor after reshape it to 2d, for ViT-B with patch size 16 on the 224x224 images, h=w=14, n denotes the batch size).


Spatial frequency channels, shape bias, and adversarial robustness--Supplementary material -- AHuman psychophysics

Neural Information Processing Systems

Figure 1 shows screenshots from our online psychophysical critical band masking experiment. Accuracy heatmaps computed for different observers in our experiment showed little individual difference (Figure 1) and an even smaller difference in terms of threshold noise SD for 50% accuracy. Table 1 shows the value of each channel property computed from Gaussian fits to the averaged human data versus those found by summarizing Gaussian fits to individual human data. Given that they are similar for all channel properties, we use the former for all reported human data in the main paper. Our existing method for computing thresholds and fitting the Gaussian function to them is difficult to apply to observers that have very high noise sensitivity (low efficiency) since it relies on good performance for the baseline (zero-noise) condition.


Spatial-frequency channels, shape bias, and adversarial robustness

Neural Information Processing Systems

What spatial frequency information do humans and neural networks use to recognize objects? In neuroscience, critical band masking is an established tool that can reveal the frequency-selective filters used for object recognition. Critical band masking measures the sensitivity of recognition performance to noise added at each spatial frequency. Existing critical band masking studies show that humans recognize periodic patterns (gratings) and letters by means of a spatial-frequency filter (or "channel") that has a frequency bandwidth of one octave (doubling of frequency). Here, we introduce critical band masking as a task for network-human comparison and test 14 humans and 76 neural networks on 16-way ImageNet categorization in the presence of narrowband noise.




db5f9f42a7157abe65bb145000b5871a-Paper.pdf

Neural Information Processing Systems

Recent workhasindicated that,unlikehumans, ImageNet-trained CNNs tendto classify images by texture rather than by shape. How pervasiveis this bias, and wheredoesitcomefrom? Wefindthat,whentrainedondatasets ofimageswith conflicting shape and texture, CNNs learn to classify by shape at least as easily as by texture. What factors, then, produce the texture bias in CNNs trained on ImageNet?





Emergence of Shape Bias in Convolutional Neural Networks through Activation Sparsity

Neural Information Processing Systems

Current deep-learning models for object recognition are known to be heavily biased toward texture. In contrast, human visual systems are known to be biased toward shape and structure. What could be the design principles in human visual systems that led to this difference? How could we introduce more shape bias into the deep learning models? In this paper, we report that sparse coding, a ubiquitous principle in the brain, can in itself introduce shape bias into the network.