Intend to Move: AMultimodal Dataset for Intention-Aware Human Motion Understanding
–Neural Information Processing Systems
Human motion is inherently intentional, yet most motion modeling paradigms focus on low-level kinematics, overlooking the semantic and causal factors that drive behavior. Existing datasets further limit progress: they capture short, decontextualized actions in static scenes, providing little grounding for embodied reasoning. To address these limitations, we introduce Intend to Move (I2M), a large-scale, multimodal dataset for intention-grounded motion modeling. I2M contains 10.1 hours of two-person 3D motion sequences recorded in dynamic realistic home environments, accompanied by multi-view RGB-D video, 3D scene geometry, and language annotations of each participant's evolving intentions. Benchmark experiments reveal a fundamental gap in current motion models: they fail to translate high-level goals into physically and socially coherent motion. I2M thus serves not only as a dataset but as a benchmark for embodied intelligence, enabling research on models that can reason about, predict, and act upon the "why" behind human motion.
Neural Information Processing Systems
Jun-14-2026, 11:53:27 GMT
- Genre:
- Research Report > Experimental Study (1.00)
- Industry:
- Media (0.46)
- Information Technology (0.46)
- Technology:
- Information Technology
- Communications (0.68)
- Sensing and Signal Processing (0.67)
- Artificial Intelligence
- Vision (1.00)
- Machine Learning (1.00)
- Robots (0.94)
- Natural Language (0.94)
- Representation & Reasoning (0.93)
- Information Technology