DiffusionPhase: Motion Diffusion in Frequency Domain

Wan, Weilin, Huang, Yiming, Wu, Shutong, Komura, Taku, Wang, Wenping, Jayaraman, Dinesh, Liu, Lingjie

Dec-6-2023–arXiv.org Artificial Intelligence

In this study, we introduce a learning-based method for generating high-quality human motion sequences from text descriptions (e.g., ``A person walks forward"). Existing techniques struggle with motion diversity and smooth transitions in generating arbitrary-length motion sequences, due to limited text-to-motion datasets and the pose representations used that often lack expressiveness or compactness. To address these issues, we propose the first method for text-conditioned human motion generation in the frequency domain of motions. We develop a network encoder that converts the motion space into a compact yet expressive parameterized phase space with high-frequency details encoded, capturing the local periodicity of motions in time and space with high accuracy. We also introduce a conditional diffusion model for predicting periodic motion parameters based on text descriptions and a start pose, efficiently achieving smooth transitions between motion sequences associated with different text descriptions. Experiments demonstrate that our approach outperforms current methods in generating a broader variety of high-quality motions, and synthesizing long sequences with natural transitions.

motion sequence, sequence, transition, (16 more...)

arXiv.org Artificial Intelligence

Dec-6-2023

arXiv.org PDF

Add feedback

Country:
- North America > United States
  - Texas (0.04)
  - Pennsylvania (0.04)
- Asia
  - China > Hong Kong (0.04)
  - Middle East > Israel
    - Tel Aviv District > Tel Aviv (0.04)

Genre:
- Research Report > New Finding (0.66)

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning > Neural Networks (1.00)
  - Robots (0.94)
  - Representation & Reasoning (0.93)
  - Natural Language > Large Language Model (0.68)