RAPTR: Radar-based 3D Pose Estimation using Transformer

Kato, Sorachi, Yataka, Ryoma, Wang, Pu Perry, Miraldo, Pedro, Fujihashi, Takuya, Boufounos, Petros

Nov-12-2025–arXiv.org Artificial Intelligence

Radar-based indoor 3D human pose estimation typically relied on fine-grained 3D keypoint labels, which are costly to obtain especially in complex indoor settings involving clutter, occlusions, or multiple people. In this paper, we propose \textbf{RAPTR} (RAdar Pose esTimation using tRansformer) under weak supervision, using only 3D BBox and 2D keypoint labels which are considerably easier and more scalable to collect. Our RAPTR is characterized by a two-stage pose decoder architecture with a pseudo-3D deformable attention to enhance (pose/joint) queries with multi-view radar features: a pose decoder estimates initial 3D poses with a 3D template loss designed to utilize the 3D BBox labels and mitigate depth ambiguities; and a joint decoder refines the initial poses with 2D keypoint labels and a 3D gravity loss. Evaluated on two indoor radar datasets, RAPTR outperforms existing methods, reducing joint position error by $34.3\%$ on HIBER and $76.9\%$ on MMVR. Our implementation is available at https://github.com/merlresearch/radar-pose-transformer.

artificial intelligence, decoder, video understanding, (18 more...)

arXiv.org Artificial Intelligence

Nov-12-2025

arXiv.org PDF

Add feedback

Country:
- Asia > Japan
  - Honshū > Kansai > Osaka Prefecture > Osaka (0.04)
- North America > United States (0.14)

Genre:
- Research Report > New Finding (0.46)

Industry:
- Automobiles & Trucks (0.46)
- Information Technology (0.67)

Technology:
- Information Technology > Artificial Intelligence > Vision > Video Understanding (1.00)