Seeing Beyond the Crop: Using Language Priors for Out-of-Bounding Box Keypoint Prediction

Mar-27-2025, 04:14:05 GMT–Neural Information Processing Systems

Accurate estimation of human pose and the pose of interacting objects, like a hockey stick, is crucial for action recognition and performance analysis, particularly in sports. Existing methods capture the object along with the human in the bounding boxes, assuming all keypoints are visible within the bounding box. This necessitates larger bounding boxes to capture the object, introducing unnecessary visual features and hindering performance in real-world cluttered environments. We propose a simple image and text-based multimodal solution TokenCLIPose that addresses this limitation. Our approach focuses solely on human keypoints within the bounding box, treating objects as unseen. TokenCLIPose leverages the rich semantic representations endowed by language for inducing keypoint-specific context, even for occluded keypoints. We evaluate the performance of TokenCLIPose on a real-world ice hockey dataset, and demonstrate its generalizability through zero-shot transfer to a smaller Lacrosse dataset.

artificial intelligence, machine learning, natural language, (20 more...)

Neural Information Processing Systems

Mar-27-2025, 04:14:05 GMT

Conferences PDF

Add feedback

Country:
- North America > United States (0.14)

Genre:
- Research Report > Experimental Study (0.93)

Industry:
- Leisure & Entertainment > Sports > Hockey (1.00)

Technology:
- Information Technology
  - Artificial Intelligence
    - Machine Learning > Neural Networks
      - Deep Learning (0.68)
    - Natural Language (1.00)
    - Representation & Reasoning (1.00)
    - Vision (1.00)
  - Graphics (1.00)
  - Sensing and Signal Processing > Image Processing (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found