Accelerating Pre-training of Multimodal LLMs via Chain-of-Sight 1 Biao Gong
–Neural Information Processing Systems
This paper introduces Chain-of-Sight, a vision-language bridge module that accelerates the pre-training of Multimodal Large Language Models (MLLMs). Our approach employs a sequence of visual resamplers that capture visual details at various spacial scales.
Neural Information Processing Systems
May-25-2025, 08:33:03 GMT
- Country:
- Europe > Switzerland > Zürich > Zürich (0.14)
- Genre:
- Research Report > Experimental Study (0.93)
- Technology: