We consider inpainting in an unsupervised setting where there is neither access to paired nor unpaired training data. The only available information is provided by the uncomplete observations and the inpainting process statistics. In this context, an observation should give rise to several plausible reconstructions which amounts at learning a distribution over the space of reconstructed images. We model the reconstruction process by using a conditional GAN with constraints on the stochastic component that introduce an explicit dependency between this component and the generated output. This allows us sampling from the latent component in order to generate a distribution of images associated to an observation. We demonstrate the capacity of our model on several image datasets: faces (CelebA), food images (Recipe-1M) and bedrooms (LSUN Bedrooms) with different types of imputation masks. The approach yields comparable performance to model variants trained with additional supervision.
Adversarial attack methods have demonstrated the fragility of deep neural networks. Their imperceptible perturbations are frequently able fool classifiers into potentially dangerous misclassifications. We propose a novel way to interpret adversarial perturbations in terms of the effective input signal that classifiers actually use. Based on this, we apply specially trained autoencoders, referred to as S2SNets, as defense mechanism. They follow a two-stage training scheme: first unsupervised, followed by a fine-tuning of the decoder, using gradients from an existing classifier. S2SNets induce a shift in the distribution of gradients propagated through them, stripping them from class-dependent signal. We analyze their robustness against several white-box and gray-box scenarios on the large ImageNet dataset. Our approach reaches comparable resilience in white-box attack scenarios as other state-of-the-art defenses in gray-box scenarios. We further analyze the relationships of AlexNet, VGG 16, ResNet 50 and Inception v3 in adversarial space, and found that VGG 16 is the easiest to fool, while perturbations from ResNet 50 are the most transferable.
We present a simple technique that allows capsule models to detect adversarial images. In addition to being trained to classify images, the capsule model is trained to reconstruct the images from the pose parameters and identity of the correct top-level capsule. Adversarial images do not look like a typical member of the predicted class and they have much larger reconstruction errors when the reconstruction is produced from the top-level capsule for that class. We show that setting a threshold on the $l2$ distance between the input image and its reconstruction from the winning capsule is very effective at detecting adversarial images for three different datasets. The same technique works quite well for CNNs that have been trained to reconstruct the image from all or part of the last hidden layer before the softmax. We then explore a stronger, white-box attack that takes the reconstruction error into account. This attack is able to fool our detection technique but in order to make the model change its prediction to another class, the attack must typically make the "adversarial" image resemble images of the other class.
Deep generative architectures provide a way to model not only images, but also complex, 3-dimensional objects, such as point clouds. In this work, we present a novel method to obtain meaningful representations of 3D shapes that can be used for clustering and reconstruction. Contrary to existing methods for 3D point cloud generation that train separate decoupled models for representation learning and generation, our approach is the first end-to-end solution that allows to simultaneously learn a latent space of representation and generate 3D shape out of it. To achieve this goal, we extend a deep Adversarial Autoencoder model (AAE) to accept 3D input and create 3D output. Thanks to our end-to-end training regime, the resulting method called 3D Adversarial Autoencoder (3dAAE) obtains either binary or continuous latent space that covers much wider portion of training data distribution, hence allowing smooth interpolation between the shapes. Finally, our extensive quantitative evaluation shows that 3dAAE provides state-of-the-art results on a set of benchmark tasks.
Generative Adversarial Network(GAN) provides a good generative framework to produce realistic samples, but suffers from two recognized issues as mode collapse and unstable training. In this work, we propose to employ explicit manifold learning as prior to alleviate mode collapse and stabilize training of GAN. Since the basic assumption of conventional manifold learning fails in case of sparse and uneven data distribution, we introduce a new target, Minimum Manifold Coding (MMC), for manifold learning to encourage simple and unfolded manifold. In essence, MMC is the general case of the shortest Hamiltonian Path problem and pursues manifold with minimum Riemann volume. Using the standardized code from MMC as prior, GAN is guaranteed to recover a simple and unfolded manifold covering all the training data. Our experiments on both the toy data and real datasets show the effectiveness of MMCGAN in alleviating mode collapse, stabilizing training, and improving the quality of generated samples.