We show that the basic classification framework alone can be used to tackle some of the most challenging computer vision tasks. In contrast to other state-of-the-art approaches, the toolkit we develop is rather minimal: it uses a single, off-the-shelf classifier for all these tasks. The crux of our approach is that we train this classifier to be adversarially robust. It turns out that adversarial robustness is precisely what we need to directly manipulate salient features of the input. Overall, our findings demonstrate the utility of robustness in the broader machine learning context. Code and models for our experiments can be found at https://git.io/robust-apps.
Computer vision engineers have used machine learning techniques for decades to detect objects of interest in images and to classify or identify categories of objects. They extract features representing points, regions, or objects of interest and then use those features to train a model to classify or learn patterns in the image data. In traditional machine learning, feature selection is a time-consuming manual process. Feature extraction usually involves processing each image with one or more image processing operations, such as calculating gradient to extract the discriminative information from each image. Deep learning algorithms can learn features, representations, and tasks directly from images, text, and sound, eliminating the need for manual feature selection.
Haber, Eldad (University of British Columbia, Vancouver, BC) | Ruthotto, Lars (Xtract Technologies, Vancouver, BC) | Holtham, Elliot (Emory University, Atlanta, GA) | Jun, Seong-Hwan (Xtract Technologies, Vancouver, BC)
In this work, we establish the relation between optimal control and training deep Convolution Neural Networks (CNNs). We show that the forward propagation in CNNs can be interpreted as a time-dependent nonlinear differential equation and learning can be seen as controlling the parameters of the differential equation such that the network approximates the data-label relation for given training data. Using this continuous interpretation, we derive two new methods to scale CNNs with respect to two different dimensions. The first class of multiscale methods connects low-resolution and high-resolution data using prolongation and restriction of CNN parameters inspired by algebraic multigrid techniques. We demonstrate that our method enables classifying high-resolution images using CNNs trained with low-resolution images and vice versa and warm-starting the learning process. The second class of multiscale methods connects shallow and deep networks and leads to new training strategies that gradually increase the depths of the CNN while re-using parameters for initializations.
I was utterly fascinated by the kind of work they were doing. They helped me to explore an idea to categorize aerial images and extract features from these images. The approach started by trying to extract features from aerial images using traditional techniques, but these methods gave poor or fair results at best. Machine Learning and Deep Learning techniques showed promising results in the past Large Scale Visual Recognition Challenge (ILSVRC) ("ImageNet Large Scale Visual Recognition Competition (ILSVRC)," n.d.) competition. The Deep Learning library Caffe (Jia et al., 2014) is a high performing tool that I could quickly get my hands on and start trying the ideas. The approach is to design and come up with a high-performance Deep Learning classifier that does the job and at the same time quick and easy to build. Thus, the ideas of the approach of transfer learning were suggested. I am thankful that by the guidance of my mentors Dr. Femiani and Dr. Razdan, I have achieved this feat by doing Land Use Land Cover classification with UC Merced ("UC Merced Land Use Dataset," n.d.) (O. A. Penatti, Nogueira, & dos Santos, 2015) dataset and tested the classifier with unrelated random samples.
Deep generative models (DGMs) of images are now sufficiently mature that they produce nearly photorealistic samples and obtain scores similar to the data distribution on heuristics such as Frechet Inception Distance. These results, especially on large-scale datasets such as ImageNet, suggest that DGMs are learning the data distribution in a perceptually meaningful space, and can be used in downstream tasks. To test this latter hypothesis, we use class-conditional generative models from a number of model classes---variational autoencoder, autoregressive models, and generative adversarial networks---to infer the class labels of real data. We perform this inference by training the image classifier using only synthetic data, and using the classifier to predict labels on real data. The performance on this task, which we call Classification Accuracy Score (CAS), highlights some surprising results not captured by traditional metrics and comprise our contributions. First, when using a state-of-the-art GAN (BigGAN), Top-5 accuracy decreases by 41.6% compared to the original data and conditional generative models from other model classes, such as high-resolution VQ-VAE and Hierarchical Autoregressive Models, substantially outperform GANs on this benchmark. Second, CAS automatically surfaces particular classes for which generative models failed to capture the data distribution, and were previously unknown in the literature. Third, we find traditional GAN metrics such as Frechet Inception Distance neither predictive of CAS nor useful when evaluating non-GAN models. Finally, we introduce Naive Augmentation Score, a variant of CAS where the image classifier is trained on both real and synthetic data, to demonstrate that naive augmentation improves classification performance in limited circumstances. In order to facilitate better diagnoses of generative models, we open-source the proposed metric.