Beyond Homogeneous Attention: Memory-Efficient LLMs via Fourier-Approximated KV Cache

Open in new window