RAPTR: Radar-based 3D Pose Estimation using Transformer
Kato, Sorachi, Yataka, Ryoma, Wang, Pu Perry, Miraldo, Pedro, Fujihashi, Takuya, Boufounos, Petros
–arXiv.org Artificial Intelligence
Radar-based indoor 3D human pose estimation typically relied on fine-grained 3D keypoint labels, which are costly to obtain especially in complex indoor settings involving clutter, occlusions, or multiple people. In this paper, we propose \textbf{RAPTR} (RAdar Pose esTimation using tRansformer) under weak supervision, using only 3D BBox and 2D keypoint labels which are considerably easier and more scalable to collect. Our RAPTR is characterized by a two-stage pose decoder architecture with a pseudo-3D deformable attention to enhance (pose/joint) queries with multi-view radar features: a pose decoder estimates initial 3D poses with a 3D template loss designed to utilize the 3D BBox labels and mitigate depth ambiguities; and a joint decoder refines the initial poses with 2D keypoint labels and a 3D gravity loss. Evaluated on two indoor radar datasets, RAPTR outperforms existing methods, reducing joint position error by $34.3\%$ on HIBER and $76.9\%$ on MMVR. Our implementation is available at https://github.com/merlresearch/radar-pose-transformer.
arXiv.org Artificial Intelligence
Nov-12-2025
- Country:
- Asia > Japan
- Honshū > Kansai > Osaka Prefecture > Osaka (0.04)
- North America > United States (0.14)
- Asia > Japan
- Genre:
- Research Report > New Finding (0.46)
- Industry:
- Automobiles & Trucks (0.46)
- Information Technology (0.67)
- Technology: