Visual Anchors Are Strong Information Aggregators For Multimodal Large Language Model

Neural Information Processing Systems 

Despite its importance, the vision-language connector has been relatively less explored.

Similar Docs  Excel Report  more

TitleSimilaritySource
None found