SMPLer-X: Scaling Up Expressive Human Pose and Shape Estimation

May-28-2025, 16:23:06 GMT–Neural Information Processing Systems

Expressive human pose and shape estimation (EHPS) unifies body, hands, and face motion capture with numerous applications. Despite encouraging progress, current state-of-the-art methods still depend largely on a confined set of training datasets. In this work, we investigate scaling up EHPS towards the first generalist foundation model (dubbed SMPLer-X), with up to ViT-Huge as the backbone and training with up to 4.5M instances from diverse data sources. With big data and the large model, SMPLer-X exhibits strong performance across diverse test benchmarks and excellent transferability to even unseen environments.

artificial intelligence, dataset, machine learning, (14 more...)

Neural Information Processing Systems

May-28-2025, 16:23:06 GMT

Conferences PDF

Add feedback

Country:
- Asia
  - Japan > Honshū (0.14)
  - Middle East > Israel (0.14)

Genre:
- Research Report > New Finding (0.46)

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning (1.00)
  - Vision > Video Understanding (0.35)