Fourier Head: Helping Large Language Models Learn Complex Probability Distributions