differentiable point cloud
Unsupervised Learning of Shape and Pose with Differentiable Point Clouds
We address the problem of learning accurate 3D shape and camera pose from a collection of unlabeled category-specific images. We train a convolutional network to predict both the shape and the pose from a single image by minimizing the reprojection error: given several views of an object, the projections of the predicted shapes to the predicted camera poses should match the provided views. To deal with pose ambiguity, we introduce an ensemble of pose predictors which we then distill to a single student model. To allow for efficient learning of high-fidelity shapes, we represent the shapes by point clouds and devise a formulation allowing for differentiable projection of these. Our experiments show that the distilled ensemble of pose predictors learns to estimate the pose accurately, while the point cloud representation allows to predict detailed shape models.
Reviews: Unsupervised Learning of Shape and Pose with Differentiable Point Clouds
I maintain my original review and think the paper should be accepted. To get around the ambiguity of shape and pose, the authors propose to have an ensemble of pose predictors, which they distill post-training into a single model. I am inclined to accept the paper. The method is a solid solution to an interesting problem and the paper is well-written. In more detail: a) This is clearly a novel solution to an interesting but, so far, poorly explored problem.
Unsupervised Learning of Shape and Pose with Differentiable Point Clouds
Insafutdinov, Eldar, Dosovitskiy, Alexey
We address the problem of learning accurate 3D shape and camera pose from a collection of unlabeled category-specific images. We train a convolutional network to predict both the shape and the pose from a single image by minimizing the reprojection error: given several views of an object, the projections of the predicted shapes to the predicted camera poses should match the provided views. To deal with pose ambiguity, we introduce an ensemble of pose predictors which we then distill to a single "student" model. To allow for efficient learning of high-fidelity shapes, we represent the shapes by point clouds and devise a formulation allowing for differentiable projection of these. Our experiments show that the distilled ensemble of pose predictors learns to estimate the pose accurately, while the point cloud representation allows to predict detailed shape models.