Compress & Cache: Vision token compression for efficient generation and retrieval

Jun-15-2026, 23:55:33 GMT–Neural Information Processing Systems

This work aims to compress the vision tokens of an LVLM into a representation that is simultaneously suitable for (a) generative and (b) discriminative tasks, (c) is nearly lossless, and (d) storage-efficient. To this end, we propose C&C, a novel compression method that leverages the LVLM itself for task-agnostic visual token compression. Unlike prior methods that perform token reduction on-the-fly, our approach offloads computation to a dedicated, upfront indexing stage, effectively decoupling compression from generation. This enables learning more powerful representations for generation during inference. At the core of C&C is a "doubleforward pass" training strategy. During the first forward pass, the LLM (of the LVLM) creates a bottleneck by compressing the dense visual tokens into a few summary tokens.

arxiv preprint arxiv, large language model, machine learning, (19 more...)

Neural Information Processing Systems

Jun-15-2026, 23:55:33 GMT

Conferences PDF

Add feedback

Genre:
- Research Report > Experimental Study (1.00)

Technology:
- Information Technology
  - Sensing and Signal Processing > Image Processing (1.00)
  - Artificial Intelligence
    - Natural Language > Large Language Model (1.00)
    - Vision (0.95)
    - Machine Learning > Neural Networks
      - Deep Learning (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found