Cheap and Quick: Efficient Vision-Language Instruction Tuning for Large Language Models Gen Luo
–Neural Information Processing Systems
Instead of using large neural networks to connect the image encoder and LLM, MMA adopts lightweight modules, i.e., adapters, to bridge the gap between LLMs and VL tasks, which also enables the joint optimization of the image and language
Neural Information Processing Systems
Oct-8-2025, 18:47:43 GMT
- Country:
- Asia > China
- Fujian Province > Xiamen (0.04)
- Guangdong Province > Shenzhen (0.04)
- Europe > Romania
- Asia > China
- Genre:
- Research Report (0.46)
- Technology: