Goto

Collaborating Authors

 imus


EPFL-Smart-Kitchen: An Ego-Exo Multi-Modal Dataset for Challenging Action and Motion Understanding in Video-Language Models

Neural Information Processing Systems

Understanding behavior requires datasets that capture humans while carrying out complex tasks. The kitchen is an excellent environment for assessing human motor and cognitive function, as many complex actions are naturally exhibited in kitchens from chopping to cleaning. Here, we introduce the EPFL-Smart-Kitchen-30 dataset, collected in a noninvasive motion capture platform inside a kitchen environment. Nine static RGB-D cameras, inertial measurement units (IMUs) and one head-mounted HoloLens~2 headset were used to capture 3D hand, body, and eye movements. The EPFL-Smart-Kitchen-30 dataset is a multi-view action dataset with synchronized exocentric, egocentric, depth, IMUs, eye gaze, body and hand kinematics spanning 29.7 hours of 16 subjects cooking four different recipes. Action sequences were densely annotated with 33.78 action segments per minute. Leveraging this multi-modal dataset, we propose four benchmarks to advance behavior understanding and modeling through 1) a vision-language benchmark, 2) a semantic text-to-motion generation benchmark, 3) a multi-modal action recognition benchmark, 4) a pose-based action segmentation benchmark. We expect the EPFL-Smart-Kitchen-30 dataset to pave the way for better methods as well as insights to understand the nature of ecologically-valid human behavior.


ToF-IP: Time-of-Flight Enhanced Sparse Inertial Poser for Real-time Human Motion Capture

Neural Information Processing Systems

Sparse inertial measurement units (IMUs) provide a portable, low-cost solution for human motion tracking but struggle with error accumulation from drift and sensor noise when estimating joint position through time-based linear acceleration integration (i.e., indirect measurement). To address this, we propose ToF-IP, a novel 3D full-body pose estimation system that integrates Time-of-Flight (ToF) sensors with sparse IMUs. The distinct advantage of our approach is that ToF sensors provide direct distance measurements, effectively mitigating error accumulation without relying on indirect time-based integration. From a hardware perspective, we maintain the portability of existing solutions by attaching ToF sensors to selected IMUs with a negligible volume increase of just 3\%. On the software side, we introduce two novel techniques to enhance multi-sensor integration: (i) a Node-Centric Data Integration strategy that leverages a Transformer encoder to explicitly model both intra-node and inter-node data integration by treating each sensing node as a token; and (ii) a Dynamic Spatial Positional Encoding scheme that encodes the continuously changing spatial positions of wearable nodes as motion-conditioned functions, enabling the model to better capture human body dynamics in the embedding space.Additionally, we contribute a 208-minute human motion dataset from 10 participants, including synchronized IMU-ToF measurements and ground-truth from optical tracking. Extensive experiments demonstrate that our method outperforms state-of-the-art approaches such as PNP, achieving superior accuracy in tracking complex and slow motions like Tai Chi, which remains challenging for inertial-only methods.



Orientation-Free Neural Network-Based Bias Estimation for Low-Cost Stationary Accelerometers

arXiv.org Artificial Intelligence

Low-cost micro-electromechanical accelerometers are widely used in navigation, robotics, and consumer devices for motion sensing and position estimation. However, their performance is often degraded by bias errors. To eliminate deterministic bias terms a calibration procedure is applied under stationary conditions. It requires accelerom- eter leveling or complex orientation-dependent calibration procedures. To overcome those requirements, in this paper we present a model-free learning-based calibration method that estimates accelerometer bias under stationary conditions, without requiring knowledge of the sensor orientation and without the need to rotate the sensors. The proposed approach provides a fast, practical, and scalable solution suitable for rapid field deployment. Experimental validation on a 13.39-hour dataset collected from six accelerometers shows that the proposed method consistently achieves error levels more than 52% lower than traditional techniques. On a broader scale, this work contributes to the advancement of accurate calibration methods in orientation-free scenarios. As a consequence, it improves the reliability of low-cost inertial sensors in diverse scientific and industrial applications and eliminates the need for leveled calibration.


An Enhanced Proprioceptive Method for Soft Robots Integrating Bend Sensors and IMUs

arXiv.org Artificial Intelligence

Abstract--This study presents an enhanced proprioceptive method for accurate shape estimation of soft robots using only off-the-shelf sensors, ensuring cost-effectiveness and easy applicability. By integrating inertial measurement units (IMUs) with complementary bend sensors, IMU drift is mitigated, enabling reliable long-term proprioception. A piecewise constant curvature model estimates the tip location from the fused orientation data and reconstructs the robot's deformation. Experiments under no loading, external forces, and passive obstacle interactions during 45 minutes of continuous operation showed a root mean square error of 16.96 mm (2.91% of total length), a 56% reduction compared to IMUonly benchmarks. These results demonstrate that our approach not only enables long-duration proprioception in soft robots but also maintains high accuracy and robustness across these diverse conditions. Soft robots possess intrinsic compliance and virtually infinite degrees of freedom, enabling continuous deformation [1].


AI-Enhanced Kinematic Modeling of Flexible Manipulators Using Multi-IMU Sensor Fusion

arXiv.org Artificial Intelligence

Abstract-- This paper presents a novel framework for estimating the position and orientation of flexible manipulators undergoing vertical motion using multiple inertial measurement units (IMUs), optimized and calibrated with ground truth data. The flexible links are modeled as a series of rigid segments, with joint angles estimated from accelerometer and gyroscope measurements acquired by cost-effective IMUs. A complementary filter is employed to fuse the measurements, with its parameters optimized through particle swarm optimization (PSO) to mitigate noise and delay. T o further improve estimation accuracy, residual errors in position and orientation are compensated using radial basis function neural networks (RBFNN). Experimental results validate the effectiveness of the proposed intelligent multi-IMU kinematic estimation method, achieving root mean square errors (RMSE) of 0.00021 m, 0.00041 m, and 0.00024 rad for y, z, and ฮธ, respectively.


Physics-Informed Learning for Human Whole-Body Kinematics Prediction via Sparse IMUs

arXiv.org Artificial Intelligence

Accurate and physically feasible human motion prediction is crucial for safe and seamless human-robot collaboration. While recent advancements in human motion capture enable real-time pose estimation, the practical value of many existing approaches is limited by the lack of future predictions and consideration of physical constraints. Conventional motion prediction schemes rely heavily on past poses, which are not always available in real-world scenarios. To address these limitations, we present a physics-informed learning framework that integrates domain knowledge into both training and inference to predict human motion using inertial measurements from only 5 IMUs. We propose a network that accounts for the spatial characteristics of human movements. During training, we incorporate forward and differential kinematics functions as additional loss components to regularize the learned joint predictions. At the inference stage, we refine the prediction from the previous iteration to update a joint state buffer, which is used as extra inputs to the network. Experimental results demonstrate that our approach achieves high accuracy, smooth transitions between motions, and generalizes well to unseen subjects


MinJointTracker: Real-time inertial kinematic chain tracking with joint position estimation and minimal state size

arXiv.org Artificial Intelligence

Inertial motion capture is a promising approach for capturing motion outside the laboratory. However, as one major drawback, most of the current methods require different quantities to be calibrated or computed offline as part of the setup process, such as segment lengths, relative orientations between inertial measurement units (IMUs) and segment coordinate frames (IMU-to-segment calibrations) or the joint positions in the IMU frames. This renders the setup process inconvenient. This work contributes to real-time capable calibration-free inertial tracking of a kinematic chain, i.e. simultaneous recursive Bayesian estimation of global IMU angular kinematics and joint positions in the IMU frames, with a minimal state size. Experimental results on simulated IMU data from a three-link kinematic chain (manipulator study) as well as re-simulated IMU data from healthy humans walking (lower body study) show that the calibration-free and lightweight algorithm provides not only drift-free relative but also drift-free absolute orientation estimates with a global heading reference for only one IMU as well as robust and fast convergence of joint position estimates in the different movement scenarios.



Human Motion Capture from Loose and Sparse Inertial Sensors with Garment-aware Diffusion Models

arXiv.org Artificial Intelligence

Motion capture using sparse inertial sensors has shown great promise due to its portability and lack of occlusion issues compared to camera-based tracking. Existing approaches typically assume that IMU sensors are tightly attached to the human body. However, this assumption often does not hold in real-world scenarios. In this paper, we present Garment Inertial Poser (GaIP), a method for estimating full-body poses from sparse and loosely attached IMU sensors. We first simulate IMU recordings using an existing garment-aware human motion dataset. Our transformer-based diffusion models synthesize loose IMU data and estimate human poses from this challenging loose IMU data. We also demonstrate that incorporating garment-related parameters during training on loose IMU data effectively maintains expressiveness and enhances the ability to capture variations introduced by looser or tighter garments. Our experiments show that our diffusion methods trained on simulated and synthetic data outperform state-of-the-art inertial full-body pose estimators, both quantitatively and qualitatively, opening up a promising direction for future research on motion capture from such realistic sensor placements.