A generic diffusion-based approach for 3D human pose prediction in the wild
Saadatnejad, Saeed, Rasekh, Ali, Mofayezi, Mohammadreza, Medghalchi, Yasamin, Rajabzadeh, Sara, Mordan, Taylor, Alahi, Alexandre
–arXiv.org Artificial Intelligence
Predicting 3D human poses in real-world scenarios, also known as human pose forecasting, is inevitably subject to noisy inputs arising from inaccurate 3D pose estimations and occlusions. To address these challenges, we propose a diffusion-based approach that can predict given noisy observations. We frame the prediction task as a denoising problem, where both observation and prediction are considered as a single sequence containing missing elements (whether in the observation or prediction horizon). All missing elements are treated as noise and denoised with our conditional diffusion model. To better handle long-term forecasting horizon, we present a temporal cascaded diffusion model. We demonstrate the benefits of our approach on four publicly available datasets (Human3.6M, HumanEva-I, AMASS, and 3DPW), outperforming the state-of-the-art. Additionally, we show that our framework is generic enough to improve any 3D pose prediction model as a pre-processing step to repair their inputs and a post-processing step to refine their outputs. The code is available online: \url{https://github.com/vita-epfl/DePOSit}.
arXiv.org Artificial Intelligence
Mar-15-2023
- Country:
- Europe (0.46)
- Genre:
- Research Report (0.82)
- Technology:
- Information Technology > Artificial Intelligence
- Machine Learning > Neural Networks (1.00)
- Representation & Reasoning (0.93)
- Robots > Humanoid Robots (0.71)
- Vision (1.00)
- Information Technology > Artificial Intelligence