Large Language Models are Visual Reasoning Coordinators Liangyu Chen,, Bo Li

Neural Information Processing Systems 

Existing methods like ensemble still struggle to aggregate these models with the desired higher-order communications. In this work, we propose Cola, a novel paradigm that coordinates multiple VLMs for visual reasoning.

Similar Docs  Excel Report  more

TitleSimilaritySource
None found