Mixing tokens with Fourier transforms to improve the efficiency of large language models

Open in new window