Matryoshka Query Transformer for Large Vision-Language Models

Mar-20-2026, 18:43:52 GMT–Neural Information Processing Systems

Large Vision-Language Models (LVLMs) typically encode an image into a fixed number of visual tokens (e.g., 576) and process these tokens with a language model.

artificial intelligence, name change, proceedings, (4 more...)

Neural Information Processing Systems

Mar-20-2026, 18:43:52 GMT

Conferences Web Page

Add feedback

Technology:
- Information Technology > Artificial Intelligence (1.00)