Goto

Collaborating Authors

 Image Processing


A Solution for Multi-Alignment by Transformation Synchronisation

arXiv.org Machine Learning

The alignment of a set of objects by means of transformations plays an important role in computer vision. Whilst the case for only two objects can be solved globally, when multiple objects are considered usually iterative methods are used. In practice the iterative methods perform well if the relative transformations between any pair of objects are free of noise. However, if only noisy relative transformations are available (e.g. due to missing data or wrong correspondences) the iterative methods may fail. Based on the observation that the underlying noise-free transformations can be retrieved from the null space of a matrix that can directly be obtained from pairwise alignments, this paper presents a novel method for the synchronisation of pairwise transformations such that they are transitively consistent. Simulations demonstrate that for noisy transformations, a large proportion of missing data and even for wrong correspondence assignments the method delivers encouraging results.


VELDA: Relating an Image Tweetโ€™s Text and Images

AAAI Conferences

Image tweets are becoming a prevalent form of socialmedia, but little is known about their content โ€” textualand visual โ€” and the relationship between the two mediums.Our analysis of image tweets shows that while visualelements certainly play a large role in image-text relationships, other factors such as emotional elements, also factor into the relationship. We develop Visual-Emotional LDA (VELDA), a novel topic model to capturethe image-text correlation from multiple perspectives (namely, visual and emotional). Experiments on real-world image tweets in both Englishand Chinese and other user generated content, show that VELDA significantly outperforms existingmethods on cross-modality image retrieval. Even in other domains where emotion does not factor in imagechoice directly, our VELDA model demonstrates good generalization ability, achieving higher fidelity modeling of such multimedia documents.


Cross-Modal Image Clustering via Canonical Correlation Analysis

AAAI Conferences

A new algorithm via Canonical Correlation Analysis (CCA) is developed in this paper to support more effective cross-modal image clustering for large-scale annotated image collections. It can be treated as a bi-media multimodal mapping problem and modeled as a correlation distribution over multimodal feature representations. It integrates the multimodal feature generation with the Locality Linear Coding (LLC) and co-occurrence association network, multimodal feature fusion with CCA, and accelerated hierarchical k-means clustering, which aims to characterize the correlations between the inter-related visual features in images and semantic features in captions, and measure their association degree more precisely. Very positive results were obtained in our experiments using a large quantity of public data.


Learning Face Hallucination in the Wild

AAAI Conferences

Face hallucination method is proposed to generate high-resolution images from low-resolution ones for better visualization. However, conventional hallucination methods are often designed for controlled settings and cannot handle varying conditions of pose, resolution degree, and blur. In this paper, we present a new method of face hallucination, which can consistently improve the resolution of face images even with large appearance variations. Our method is based on a novel network architecture called Bi-channel Convolutional Neural Network (Bi-channel CNN). It extracts robust face representations from raw input by using deep convolutional network, then adaptively integrates two channels of information (the raw input image and face representations) to predict the high-resolution image. Experimental results show our system outperforms the prior state-of-the-art methods.


Low-Rank Multi-View Learning in Matrix Completion for Multi-Label Image Classification

AAAI Conferences

Multi-label image classification is of significant interest due to its major role in real-world web image analysis applications such as large-scale image retrieval and browsing. Recently, matrix completion (MC) has been developed to deal with multi-label classification tasks. MC has distinct advantages, such as robustness to missing entries in the feature and label spaces and a natural ability to handle multi-label problems. However, current MC-based multi-label image classification methods only consider data represented by a single-view feature, therefore, do not precisely characterize images that contain several semantic concepts. An intuitive way to utilize multiple features taken from different views is to concatenate the different features into a long vector; however, this concatenation is prone to over-fitting and leads to high time complexity in MC-based image classification. Therefore, we present a novel multi-view learning model for MC-based image classification, called low-rank multi-view matrix completion (lrMMC), which first seeks a low-dimensional common representation of all views by utilizing the proposed low-rank multi-view learning (lrMVL) algorithm. In lrMVL, the common subspace is constrained to be low rank so that it is suitable for MC. In addition, combination weights are learned to explore complementarity between different views. An efficient solver based on fixed-point continuation (FPC) is developed for optimization, and the learned low-rank representation is then incorporated into MC-based image classification. Extensive experimentation on the challenging PASCAL VOC' 07 dataset demonstrates the superiority of lrMMC compared to other multi-label image classification approaches.


Integrating Image Clustering and Codebook Learning

AAAI Conferences

Image clustering and visual codebook learning are two fundamental problems in computer vision and they are tightly related. On one hand, a good codebook can generate effective feature representations which largely affect clustering performance. On the other hand, class labels obtained from image clustering can serve as supervised information to guide codebook learning. Traditionally, these two processes are conducted separately and their correlation is generally ignored.In this paper, we propose a Double Layer Gaussian Mixture Model (DLGMM) to simultaneously perform image clustering and codebook learning. In DLGMM, two tasks are seamlessly coupled and can mutually promote each other. Cluster labels and codebook are jointly estimated to achieve the overall best performance. To incorporate the spatial coherence between neighboring visual patches, we propose a Spatially Coherent DLGMM which uses a Markov Random Field to encourage neighboring patches to share the same visual word label.We use variational inference to approximate the posterior of latent variables and learn model parameters.Experiments on two datasets demonstrate the effectiveness of two models.


Visually Interpreting Names as Demographic Attributes by Exploiting Click-Through Data

AAAI Conferences

Name of an identity is strongly influenced by his/her cultural background such as gender and ethnicity, both vital attributes for user profiling, attribute-based retrieval, etc. Typically, the associations between names and attributes (e.g., people named "Amy" are mostly females) are annotated manually or provided by the census data of governments. We propose to associate a name and its likely demographic attributes by exploiting click-throughs between name queries and images with automatically detected facial attributes. This is the first work attempting to translate an abstract name to demographic attributes in visual-data-driven manner, and it is adaptive to incremental data, more countries and even unseen names (the names out of click-through data) without additional manual labels. In the experiments, the automatic name-attribute associations can help gender inference with competitive accuracy by using manual labeling. It also benefits profiling social media users and keyword-based face image retrieval, especially for contributing 12% relative improvement of accuracy in adapting to unseen names.


A review of mean-shift algorithms for clustering

arXiv.org Machine Learning

A natural way to characterize the cluster structure of a dataset is by finding regions containing a high density of data. This can be done in a nonparametric way with a kernel density estimate, whose modes and hence clusters can be found using mean-shift algorithms. We describe the theory and practice behind clustering based on kernel density estimates and mean-shift algorithms. We discuss the blurring and non-blurring versions of mean-shift; theoretical results about mean-shift algorithms and Gaussian mixtures; relations with scale-space theory, spectral clustering and other algorithms; extensions to tracking, to manifold and graph data, and to manifold denoising; K-modes and Laplacian K-modes algorithms; acceleration strategies for large datasets; and applications to image segmentation, manifold denoising and multivalued regression.


Consensus Message Passing for Layered Graphical Models

arXiv.org Artificial Intelligence

Generative models provide a powerful framework for probabilistic reasoning. However, in many domains their use has been hampered by the practical difficulties of inference. This is particularly the case in computer vision, where models of the imaging process tend to be large, loopy and layered. For this reason bottom-up conditional models have traditionally dominated in such domains. We find that widely-used, general-purpose message passing inference algorithms such as Expectation Propagation (EP) and Variational Message Passing (VMP) fail on the simplest of vision models. With these models in mind, we introduce a modification to message passing that learns to exploit their layered structure by passing 'consensus' messages that guide inference towards good solutions. Experiments on a variety of problems show that the proposed technique leads to significantly more accurate inference results, not only when compared to standard EP and VMP, but also when compared to competitive bottom-up conditional models.


Z.til

AI Classics

This paper describes some work on automatically generating finite counterexamples in topology, and the use of counterexamples to speed up proof discovery in intermediate analysis, and gives some examples theorems where human provers are aided in proof discovery by the use of examples.