InternLM-XComposer2-4KHD: A Pioneering Large Vision-Language Model Handling Resolutions from 336 Pixels to 4K HD Xiaoyi Dong

Neural Information Processing Systems 

The Large Vision-Language Model (L VLM) field has seen significant advancements, yet its progression has been hindered by challenges in comprehending fine-grained visual content due to limited resolution.

Similar Docs  Excel Report  more

TitleSimilaritySource
None found