Matryoshka Query Transformer for Large Vision-Language Models
–Neural Information Processing Systems
Large Vision-Language Models (LVLMs) typically encode an image into a fixed number of visual tokens (e.g., 576) and process these tokens with a language model.
Neural Information Processing Systems
May-29-2025, 15:42:54 GMT
- Country:
- North America > United States > California (0.14)
- Genre:
- Research Report > Experimental Study (1.00)
- Industry:
- Government (0.68)
- Technology: