Simplicity Bias in Transformers and their Ability to Learn Sparse Boolean Functions