Cheap and Quick: Efficient Vision-Language Instruction Tuning for Large Language Models Gen Luo

Neural Information Processing Systems 

Instead of using large neural networks to connect the image encoder and LLM, MMA adopts lightweight modules, i.e., adapters, to bridge the gap between LLMs and VL tasks, which also enables the joint optimization of the image and language

Similar Docs  Excel Report  more

TitleSimilaritySource
None found