MaVEn: An Effective Multi-granularity Hybrid Visual Encoding Framework for Multimodal Large Language Model

Neural Information Processing Systems 

This dual approach bridges the semantic gap between visual and textual data, thereby improving the model's ability to process and interpret information from multiple images effectively.

Similar Docs  Excel Report  more

TitleSimilaritySource
None found