Simplicity Bias in Transformers and their Ability to Learn Sparse Boolean Functions

Open in new window