Simplicity Bias of Transformers to Learn Low Sensitivity Functions

Open in new window