Attention-Only Transformers and Implementing MLPs with Attention Heads