KptLLM: Unveiling the Power of Large Language Model for Keypoint Comprehension Jie Y ang 1,2,5 Wang Zeng
–Neural Information Processing Systems
Recent advancements in Multimodal Large Language Models (MLLMs) have greatly improved their abilities in image understanding. However, these models often struggle with grasping pixel-level semantic details, e.g., the keypoints of an object. To bridge this gap, we introduce the novel challenge of Semantic Keypoint Comprehension, which aims to comprehend keypoints across different task scenarios, including keypoint semantic understanding, visual prompt-based keypoint detection, and textual prompt-based keypoint detection.
Neural Information Processing Systems
Oct-10-2025, 22:37:09 GMT
- Country:
- Asia > China
- Guangdong Province > Shenzhen (0.04)
- Hong Kong (0.04)
- Asia > China
- Genre:
- Research Report
- Experimental Study (0.93)
- New Finding (0.68)
- Research Report
- Industry:
- Information Technology (0.67)
- Technology: