FlashMo: Geometric Interpolants and Frequency-Aware Sparsity for Scalable Efficient Motion Generation

Jun-16-2026, 01:55:03 GMT–Neural Information Processing Systems

Notably, recent progress in text-to-motion generation, particularly with autoregressive [100, 60, 59, 30, 85, 39, 92] and diffusion models [70, 93, 7, 94, 44, 73], has enabled the synthesis of natural human motion from natural language. While VQ-VAE-based autoregressive methods achieve outstanding quantitative results, they generate less natural motion with jitters due to frame-wise noise arising from directly decoding discrete tokens, and fine-grained motion details are sometimes lost during token discretization [13]. In contrast, motion diffusion models generate smoother and more realistic human motion, showing a promising trend in human motion generation [73, 74, 95]. However, despite their strengths, diffusion-based approaches still face two significant challenges, collectively limiting their applicability in real-world scenarios.

arxiv preprint arxiv, machine learning, natural language, (18 more...)

Neural Information Processing Systems

Jun-16-2026, 01:55:03 GMT

Conferences PDF

Add feedback

Country:
- Asia (0.28)

Genre:
- Research Report > Experimental Study (1.00)

Industry:
- Information Technology (0.46)

Technology:
- Information Technology > Artificial Intelligence
  - Vision (1.00)
  - Representation & Reasoning (1.00)
  - Natural Language (1.00)
  - Robots (0.93)
  - Machine Learning > Neural Networks (0.92)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found