Supplementary: Non-Local Latent Relation Distillation for Self-Adaptive 3DHuman Pose Estimation

Neural Information Processing Systems 

The raw video frames are forwarded through a person-detector [15] to obtain the person-focused image sequences. Note that, the detector pruned video sequences may not have a smooth pixel transition. However, it retains the smooth pose transition at the view-variant root-relative system. In our work, the shared latent pose can be seen as a parametric form to represent plausible 3D poses. And, the image-to-latent model is trained to regress the latent pose parameters with latent being an intermediate 3D pose representation.