FlexMotion: Lightweight, Physics-Aware, and Controllable Human Motion Generation

Tashakori, Arvin, Tashakori, Arash, Yang, Gongbo, Wang, Z. Jane, Servati, Peyman

arXiv.org Artificial Intelligence 

Lightweight, controllable, and physically plausible human motion synthesis is crucial for animation, virtual reality, robotics, and human-computer interaction applications. We propose FlexMotion, a novel framework that leverages a computationally lightweight diffusion model operating in the latent space, eliminating the need for physics simulators and enabling fast and efficient training. FlexMotion employs a multimodal pre-trained Transformer encoder-decoder, integrating joint locations, contact forces, joint actuations and muscle activations to ensure the physical plausibility of the generated motions. FlexMotion also introduces a plug-and-play module, which adds spatial controllability over a range of motion parameters (e.g., joint locations, joint actuations, contact forces, and muscle activations). Our framework achieves realistic motion generation with improved efficiency and control, setting a new benchmark for human motion synthesis. We evaluate FlexMotion on extended datasets and demonstrate its superior performance in terms of realism, physical plausibility, and controllability. Human motion involves complex interactions between joint movements, contact forces, and muscle activations, necessitating a comprehensive approach that can capture both kinematic and dynamic aspects. Despite the remarkable progress in human motion generation, challenges remain in developing models that effectively balance physical realism, computational efficiency, and fine-grained controllability. Traditional methods often fail to control the intricate biomechanics of human movement, which involve complex interactions between kinematics, dynamics, and environmental context Tripathi et al. (2023b); Zhang et al. (2024b); Xie et al. (2021a); Chiquier & Vondrick (2023). This deficiency is particularly notable in applications such as sports and rehabilitation, where the precision of muscle activations and contact forces is crucial for accurate simulation Chiquier & Vondrick (2023). Furthermore, current methods focused on physical plausibility often demand high computational resources, such as physics engines, rendering them impractical for real-time applications Yuan et al. (2023); Xie et al. (2021a); Tripathi et al. (2023a).