Collaborating Authors

Multiview Aggregation for Learning Category-Specific Shape Reconstruction

Neural Information Processing Systems

We investigate the problem of learning category-specific 3D shape reconstruction from a variable number of RGB views of previously unobserved object instances. Most approaches for multiview shape reconstruction operate on sparse shape representations, or assume a fixed number of views. We present a method that can estimate dense 3D shape, and aggregate shape across multiple and varying number of input views. Given a single input view of an object instance, we propose a representation that encodes the dense shape of the visible object surface as well as the surface behind line of sight occluded by the visible surface. When multiple input views are available, the shape representation is designed to be aggregated into a single 3D shape using an inexpensive union operation.

Implicit-Decoder -- 3D reconstruction


An encoding-decoding type of neural network to encode the 3D structure of a shape from a 2D image and then decode this structure and reconstruct the 3D shape. This is the highest quality 3D reconstruction from 1 image research I have seen yet. To reconstruct the entire structure of the object, all 3D coordinates in space are sent to the decoder (in the paper's case there were 64X64X64 coordinates per object), along with the single z-vector from the image. The decoder classifies each coordinate and creates a representation of the 3D structure. This creates a voxel representation of the 3D object.

HybridNet: Classification and Reconstruction Cooperation for Semi-Supervised Learning Machine Learning

In this paper, we introduce a new model for leveraging unlabeled data to improve generalization performances of image classifiers: a two-branch encoder-decoder architecture called HybridNet. The first branch receives supervision signal and is dedicated to the extraction of invariant class-related representations. The second branch is fully unsupervised and dedicated to model information discarded by the first branch to reconstruct input data. To further support the expected behavior of our model, we propose an original training objective. It favors stability in the discriminative branch and complementarity between the learned representations in the two branches. HybridNet is able to outperform state-of-the-art results on CIFAR-10, SVHN and STL-10 in various semi-supervised settings. In addition, visualizations and ablation studies validate our contributions and the behavior of the model on both CIFAR-10 and STL-10 datasets.

A Reduction for Efficient LDA Topic Reconstruction

Neural Information Processing Systems

We present a novel approach for LDA (Latent Dirichlet Allocation) topic reconstruction. The main technical idea is to show that the distribution over the documents generated by LDA can be transformed into a distribution for a much simpler generative model in which documents are generated from {\em the same set of topics} but have a much simpler structure: documents are single topic and topics are chosen uniformly at random. Furthermore, this reduction is approximation preserving, in the sense that approximate distributions-- the only ones we can hope to compute in practice-- are mapped into approximate distribution in the simplified world. This opens up the possibility of efficiently reconstructing LDA topics in a roundabout way. Compute an approximate document distribution from the given corpus, transform it into an approximate distribution for the single-topic world, and run a reconstruction algorithm in the uniform, single topic world-- a much simpler task than direct LDA reconstruction.

Smartphone videos produce highly realistic 3-D face reconstructions


Normally, it takes pricey equipment and expertise to create an accurate 3-D reconstruction of someone's face that's realistic and doesn't look creepy. Now, Carnegie Mellon University researchers have pulled off the feat using video recorded on an ordinary smartphone. Using a smartphone to shoot a continuous video of the front and sides of the face generates a dense cloud of data. A two-step process developed by CMU's Robotics Institute uses that data, with some help from deep learning algorithms, to build a digital reconstruction of the face. The team's experiments show that their method can achieve sub-millimeter accuracy, outperforming other camera-based processes.