Autoregressive Motion Generation with Gaussian Mixture-Guided Latent Sampling

Neural Information Processing Systems 

Existing efforts in motion synthesis typically utilize either generative transformers with discrete representations or diffusion models with continuous representations. However, the discretization process in generative transformers can introduce motion errors, while the sampling process in diffusion models tends to be slow. In this paper, we propose a novel text-to-motion synthesis method GMMotion that combines a continuous motion representation with an autoregressive model, using the Gaussian mixture model (GMM) to represent the conditional probability distribution. Unlike autoregressive approaches relying on residual vector quantization, our model employs continuous motion representations derived from the VAE's latent space.