VisionLLM: Large Language Model is also an Open-Ended Decoder for Vision-Centric Tasks Wenhai Wang 2 Zhe Chen 1,3 Xiaokang Chen 1,4 Jiannan Wu

Neural Information Processing Systems 

It's noteworthy that, with a generalist LLMbased framework, our model can achieve over 60% mAP on COCO, on par with

Similar Docs  Excel Report  more

TitleSimilaritySource
None found