Punching Bag vs. Punching Person: Motion Transferability in Videos
Abdullah, Raiyaan, Claypoole, Jared, Cogswell, Michael, Divakaran, Ajay, Rawat, Yogesh
–arXiv.org Artificial Intelligence
Action recognition models demonstrate strong generalization, but can they effectively transfer high-level motion concepts across diverse contexts, even within similar distributions? F or example, can a model recognize the broad action "punching" when presented with an unseen variation such as "punching person"? T o explore this, we introduce a motion transferability framework with three datasets: (1) Syn-TA, a synthetic dataset with 3D object motions; (2) Kinetics400-TA; and (3) Something-Something-v2-TA, both adapted from natural video datasets. W e evaluate 13 state-of-the-art models on these benchmarks and observe a significant drop in performance when recognizing high-level actions in novel contexts. Our analysis reveals: 1) Multimodal models struggle more with fine-grained unknown actions than with coarse ones; 2) The bias-free Syn-TA proves as challenging as real-world datasets, with models showing greater performance drops in controlled settings; 3) Larger models improve transferability when spatial cues dominate but struggle with intensive temporal reasoning, while reliance on object and background cues hinders generalization. W e further explore how disentangling coarse and fine motions can improve recognition in temporally challenging datasets. W e believe this study establishes a crucial benchmark for assessing motion transferability in action recognition.
arXiv.org Artificial Intelligence
Aug-4-2025
- Country:
- Asia
- India (0.04)
- Middle East > Israel (0.04)
- Europe
- Italy (0.04)
- Switzerland (0.04)
- North America > United States
- California > Ventura County
- Thousand Oaks (0.04)
- Florida > Miami-Dade County
- Miami (0.04)
- California > Ventura County
- Asia
- Genre:
- Research Report > Promising Solution (0.34)
- Industry:
- Leisure & Entertainment > Sports (1.00)
- Technology:
- Information Technology > Artificial Intelligence
- Machine Learning > Neural Networks (0.67)
- Natural Language (1.00)
- Representation & Reasoning (1.00)
- Vision (1.00)
- Information Technology > Artificial Intelligence