SMPLer-X: Scaling Up Expressive Human Pose and Shape Estimation
–Neural Information Processing Systems
Expressive human pose and shape estimation (EHPS) unifies body, hands, and face motion capture with numerous applications. Despite encouraging progress, current state-of-the-art methods still depend largely on a confined set of training datasets. In this work, we investigate scaling up EHPS towards the first generalist foundation model (dubbed SMPLer-X), with up to ViT-Huge as the backbone and training with up to 4.5M instances from diverse data sources. With big data and the large model, SMPLer-X exhibits strong performance across diverse test benchmarks and excellent transferability to even unseen environments.
Neural Information Processing Systems
May-28-2025, 16:23:06 GMT
- Country:
- Asia
- Japan > Honshū (0.14)
- Middle East > Israel (0.14)
- Asia
- Genre:
- Research Report > New Finding (0.46)
- Technology: