Beyond Scaling Laws: Understanding Transformer Performance with Associative Memory