Image Processing
Multi-Label Prediction via Compressed Sensing
Hsu, Daniel J., Kakade, Sham M., Langford, John, Zhang, Tong
We consider multi-label prediction problems with large output spaces under the assumption of output sparsity - that the target (label) vectors have small support. We develop a general theory for a variant of the popular error correcting output code scheme, using ideas from compressed sensing for exploiting this sparsity. The method can be regarded as a simple reduction from multi-label regression problems to binary regression problems. We show that the number of subproblems need only be logarithmic in the total number of possible labels, making this approach radically more efficient than others. We also state and prove robustness guarantees for this method in the form of regret transform bounds (in general), and also provide a more detailed analysis for the linear prediction setting.
Filtering Abstract Senses From Image Search Results
We propose an unsupervised method that, given a word, automatically selects non-abstract senses of that word from an online ontology and generates images depicting the corresponding entities. When faced with the task of learning a visual model based only on the name of an object, a common approach is to find images on the web that are associated with the object name, and then train a visual classifier from the search result. As words are generally polysemous, this approach can lead to relatively noisy models if many examples due to outlier senses are added to the model. We argue that images associated with an abstract word sense should be excluded when training a visual classifier to learn a model of a physical object. While image clustering can group together visually coherent sets of returned images, it can be difficult to distinguish whether an image cluster relates to a desired object or to an abstract sense of the word. We propose a method that uses both image features and the text associated with the images to relate latent topics to particular senses. Our model does not require any human supervision, and takes as input only the name of an object category. We show results of retrieving concrete-sense images in two available multimodal, multi-sense databases, as well as experiment with object classifiers trained on concrete-sense images returned by our method for a set of ten common office objects.
Segmenting Scenes by Matching Image Composites
Russell, Bryan, Efros, Alyosha, Sivic, Josef, Freeman, Bill, Zisserman, Andrew
In this paper, we investigate how, given an image, similar images sharing the same global description can help with unsupervised scene segmentation. In contrast to recent work in semantic alignment of scenes, we allow an input image to be explained by partial matches of similar scenes. This allows for a better explanation of the input scenes. We perform MRF-based segmentation that optimizes over matches, while respecting boundary information. The recovered segments are then used to re-query a large database of images to retrieve better matches for the target regions. We show improved performance in detecting the principal occluding and contact boundaries for the scene over previous methods on data gathered from the LabelMe database.
The 'tree-dependent components' of natural scenes are edge filters
We propose a new model for natural image statistics. Instead of minimizing dependency between components of natural images, we maximize a simple form of dependency in the form of tree-dependency. By learning filters and tree structures which are best suited for natural images we observe that the resulting filters are edge filters, similar to the famous ICA on natural images results. Calculating the likelihood of the model requires estimating the squared output of pairs of filters connected in the tree. We observe that after learning, these pairs of filters are predominantly of similar orientations but different phases, so their joint energy resembles models of complex cells.
Natural Image Denoising with Convolutional Networks
We present an approach to low-level vision that combines two main ideas: the use of convolutional networks as an image processing architecture and an unsupervised learning procedure that synthesizes training samples from specific noise models. We demonstrate this approach on the challenging problem of natural image denoising. Using a test set with a hundred natural images, we find that convolutional networks provide comparable and in some cases superior performance to state of the art wavelet and Markov random field (MRF) methods. Moreover, we find that a convolutional network offers similar performance in the blind denoising setting as compared to other techniques in the non-blind setting. We also show how convolutional networks are mathematically related to MRF approaches by presenting a mean field theory for an MRF specially designed for image denoising. Although these approaches are related, convolutional networks avoid computational difficulties in MRF approaches that arise from probabilistic learning and inference. This makes it possible to learn image processing architectures that have a high degree of representational power (we train models with over 15,000 parameters), but whose computational expense is significantly less than that associated with inference in MRF approaches with even hundreds of parameters.
An Additive Latent Feature Model for Transparent Object Recognition
Fritz, Mario, Bradski, Gary, Karayev, Sergey, Darrell, Trevor, Black, Michael J.
Existing methods for recognition of object instances and categories based on quantized local features can perform poorly when local features exist on transparent surfaces, such as glass or plastic objects. There are characteristic patterns to the local appearance of transparent objects, but they may not be well captured by distances to individual examples or by a local pattern codebook obtained by vector quantization. The appearance of a transparent patch is determined in part by the refraction of a background pattern through a transparent medium: the energy from the background usually dominates the patch appearance. We model transparent local patch appearance using an additive model of latent factors: background factors due to scene content, and factors which capture a local edge energy distribution characteristic of the refraction. We implement our method using a novel LDA-SIFT formulation which performs LDA prior to any vector quantization step; we discover latent topics which are characteristic of particular transparent patches and quantize the SIFT space into transparent visual words according to the latent topic dimensions. No knowledge of the background scene is required at test time; we show examples recognizing transparent glasses in a domestic environment.
Speeding up Magnetic Resonance Image Acquisition by Bayesian Multi-Slice Adaptive Compressed Sensing
We show how to sequentially optimize magnetic resonance imaging measurement designs over stacks of neighbouring image slices, by performing convex variational inference on a large scale non-Gaussian linear dynamical system, tracking dominating directions of posterior covariance without imposing any factorization constraints. Our approach can be scaled up to high-resolution images by reductions to numerical mathematics primitives and parallelization on several levels. In a first study, designs are found that improve significantly on others chosen independently for each slice or drawn at random.
Group Sparse Coding
Bengio, Samy, Pereira, Fernando, Singer, Yoram, Strelow, Dennis
Bag-of-words document representations are often used in text, image and video processing. While it is relatively easy to determine a suitable word dictionary for text documents, there is no simple mapping from raw images or videos to dictionary terms. The classical approach builds a dictionary using vector quantization over a large set of useful visual descriptors extracted from a training set, and uses a nearest-neighbor algorithm to count the number of occurrences of each dictionary word in documents to be encoded. More robust approaches have been proposed recently that represent each visual descriptor as a sparse weighted combination of dictionary words. While favoring a sparse representation at the level of visual descriptors, those methods however do not ensure that images have sparse representation. In this work, we use mixed-norm regularization to achieve sparsity at the image level as well as a small overall dictionary. This approach can also be used to encourage using the same dictionary words for all the images in a class, providing a discriminative signal in the construction of image representations. Experimental results on a benchmark image classification dataset show that when compact image or dictionary representations are needed for computational efficiency, the proposed approach yields better mean average precision in classification.
Unsupervised Detection of Regions of Interest Using Iterative Link Analysis
Kim, Gunhee, Torralba, Antonio
This paper proposes a fast and scalable alternating optimization technique to detect regionsof interest (ROIs) in cluttered Web images without labels. The proposed approachdiscovers highly probable regions of object instances by iteratively repeating the following two functions: (1) choose the exemplar set (i.e. a small number of highly ranked reference ROIs) across the dataset and (2) refine the ROIs of each image with respect to the exemplar set. These two subproblems are formulated as ranking in two different similarity networks of ROI hypotheses by link analysis. The experiments with the PASCAL 06 dataset show that our unsupervised localization performance is better than one of state-of-the-art techniques andcomparable to supervised methods. Also, we test the scalability of our approach with five objects in Flickr dataset consisting of more than 200K images.
Hierarchical Modeling of Local Image Features through $L_p$-Nested Symmetric Distributions
Bethge, Matthias, Simoncelli, Eero P., Sinz, Fabian H.
We introduce a new family of distributions, called $L_p${\em -nested symmetric distributions}, whose densities access the data exclusively through a hierarchical cascade of $L_p$-norms. This class generalizes the family of spherically and $L_p$-spherically symmetric distributions which have recently been successfully used for natural image modeling. Similar to those distributions it allows for a nonlinear mechanism to reduce the dependencies between its variables. With suitable choices of the parameters and norms, this family also includes the Independent Subspace Analysis (ISA) model, which has been proposed as a means of deriving filters that mimic complex cells found in mammalian primary visual cortex. $L_p$-nested distributions are easy to estimate and allow us to explore the variety of models between ISA and the $L_p$-spherically symmetric models. Our main findings are that, without a preprocessing step of contrast gain control, the independent subspaces of ISA are in fact more dependent than the individual filter coefficients within a subspace and, with contrast gain control, where ISA finds more than one subspace, the filter responses were almost independent anyway.