Appendices

Neural Information Processing Systems 

The supplementary material is organized as follows. We first discuss additional related work and provide experiment details in Section 2 and Appendix B respectively. We evaluate the extent to which ensemble methods and adversarial training mitigate Simplicity Bias (SB) in Appendix E. Finally, we provide the proof of Theorem 1 ( k 1) Also recall that each dataset comprises at most one "simple" feature Our results in Section 4 hold on all three MNIST-CIFAR datasets. In this section, we supplement our results in Section 4 of the paper by showing that extreme simplicity bias (SB) persists across several model architectures and on synthetic as well as image-based datasets. Now, we study the effect of activation function and optimizer on extreme SB.