KptLLM: Unveiling the Power of Large Language Model for Keypoint Comprehension Jie Y ang 1,2,5 Wang Zeng

Neural Information Processing Systems 

Recent advancements in Multimodal Large Language Models (MLLMs) have greatly improved their abilities in image understanding. However, these models often struggle with grasping pixel-level semantic details, e.g., the keypoints of an object. To bridge this gap, we introduce the novel challenge of Semantic Keypoint Comprehension, which aims to comprehend keypoints across different task scenarios, including keypoint semantic understanding, visual prompt-based keypoint detection, and textual prompt-based keypoint detection.

Similar Docs  Excel Report  more

TitleSimilaritySource
None found