Synthetic-to-Real Pose Estimation with Geometric Reconstruction Qiuxia Lin Kerui Gu1 Linlin Yang

Neural Information Processing Systems 

The warping estimation module W is based on an hourglass with five conv3 3 - bn - relu - pool2 2 in the encoders and five upsample2 2 - conv3 3 - bn - relu blocks in the decoders. In G, we use the Johnson architecture [3] with two down-sampling blocks, six residual-blocks and two up-sampling blocks. The inputs are the base image, displacement field, and inpainting map. It downsampled 4 and upsampled 4 to get the output, i.e. the reconstructed image. The generator is pre-trained with predicted keypoints before applying the geometric reconstruction module.