Beyond Scaling Laws: Understanding Transformer Performance with Associative Memory

Open in new window