Fourier-VLM: Compressing Vision Tokens in the Frequency Domain for Large Vision-Language Models

Open in new window