Pre-trained Large Language Models Use Fourier Features to Compute Addition

Open in new window