Vision
Histogram of Oriented Displacements (HOD): Describing Trajectories of Human Joints for Action Recognition
Gowayyed, Mohammad Abdelaziz (Alexandria University) | Torki, Marwan (Alexandria University) | Hussein, Mohammed Elsayed (Alexandria University) | El-Saban, Motaz (Microsoft Research)
Creating descriptors for trajectories has many applications in robotics/human motion analysis and video copy detection. Here, we propose a novel descriptor for 2D trajectories: Histogram of Oriented Displacements (HOD). Each displacement in the trajectory votes with its length in a histogram of orientation angles. 3D trajectories are described by the HOD of their three projections. We use HOD to describe the 3D trajectories of body joints to recognize human actions, which is a challenging machine vision task, with applications in human-robot/machine interaction, interactive entertainment, multimedia information retrieval, and surveillance. The descriptor is fixed-length, scale-invariant and speed-invariant. Experiments on MSR-Action3D and HDM05 datasets show that the descriptor outperforms the state-of-the-art when using off-the-shelf classification tools.
Regularized Latent Least Square Regression for Cross Pose Face Recognition
Cai, Xinyuan (Institute of Automations, Chinese Academy of Science) | Wang, Chunheng (Institute of Automations, Chinese Academy of Science) | Xiao, Baihua (Institute of Automations, Chinese Academy of Science) | Chen, Xue (Institute of Automations, Chinese Academy of Science) | Zhou, Ji (Institute of Automations, Chinese Academy of Science)
Pose variation is one of the challenging factors for face recognition. In this paper, we propose a novel cross-pose face recognition method named as Regularized Latent Least Square Regression (RLLSR). The basic assumption is that the images captured under different poses of one person can be viewed as pose-specific transforms of a single ideal object. We treat the observed images as regressor, the ideal object as response, and then formulate this assumption in the least square regression framework, so as to learn the multiple pose-specific transforms. Specifically, we incorporate some prior knowledge as two regularization terms into the least square approach: 1) the smoothness regularization, as the transforms for nearby poses should not differ too much; 2) the local consistency constraint, as the distribution of the latent ideal objects should preserve the geometric structure of the observed image space. We develop an alternating algorithm to simultaneously solve for the ideal objects of the training individuals and a set of pose-specific transforms. The experimental results on the Multi-PIE dataset demonstrate the effectiveness of the proposed method and superiority over the previous methods.
Bilevel Visual Words Coding for Image Classification
Zhang, Jiemi (Zhejiang University) | Wu, Chenxia (Zhejiang University) | Cai, Deng (Zhejiang University) | Zhu, Jianke (Zhejiang University)
Bag-of-Words approach has played an important role in recent works for image classification. In consideration of efficiency, most methods use k-means clustering to generate the codebook. The obtained codebooks often lose the cluster size and shape information with distortion errors and low discriminative power. Though some efforts have been made to optimize codebook in sparse coding, they usually incur higher computational cost. Moreover, they ignore the correlations between codes in the following coding stage, that leads to low discriminative power of the final representation. In this paper, we propose a bilevel visual words coding approach in consideration of representation ability, discriminative power and efficiency. In the bilevel codebook generation stage, k-means and an efficient spectral clustering are respectively run in each level by taking both class information and the shapes of each visual word cluster into account. To obtain discriminative representation in the coding stage, we design a certain localized coding rule with bilevel codebook to select local bases. To further achieve an efficient coding referring to this rule, an online method is proposed to efficiently learn a projection of local descriptor to the visual words in the codebook. After projection, coding can be efficiently completed by a low dimensional localized soft-assignment. Experimental results show that our proposed bilevel visual words coding approach outperforms the state-of-the-art approaches for image classification.
Scalable $k$-NN graph construction
Wang, Jingdong, Wang, Jing, Zeng, Gang, Tu, Zhuowen, Gan, Rui, Li, Shipeng
The $k$-NN graph has played a central role in increasingly popular data-driven techniques for various learning and vision tasks; yet, finding an efficient and effective way to construct $k$-NN graphs remains a challenge, especially for large-scale high-dimensional data. In this paper, we propose a new approach to construct approximate $k$-NN graphs with emphasis in: efficiency and accuracy. We hierarchically and randomly divide the data points into subsets and build an exact neighborhood graph over each subset, achieving a base approximate neighborhood graph; we then repeat this process for several times to generate multiple neighborhood graphs, which are combined to yield a more accurate approximate neighborhood graph. Furthermore, we propose a neighborhood propagation scheme to further enhance the accuracy. We show both theoretical and empirical accuracy and efficiency of our approach to $k$-NN graph construction and demonstrate significant speed-up in dealing with large scale visual data.
Integration of 3D Object Recognition and Planning for Robotic Manipulation: A Preliminary Report
Duff, Damien Jade, Erdem, Esra, Patoglu, Volkan
We investigate different approaches to integrating object recognition and planning in a tabletop manipulation domain with the set of objects used in the 2012 RoboCup@Work competition. Results of our preliminary experiments show that, with some approaches, close integration of perception and planning improves the quality of plans, as well as the computation times of feasible plans.
Vector-Valued Multi-View Semi-Supervsed Learning for Multi-Label Image Classification
Luo, Yong (Peking University) | Tao, Dacheng (University of Technology, Sydney) | Xu, Chang (Peking University) | Li, Dongchen (Peking University) | Xu, Chao (Peking University)
Images are usually associated with multiple labels and comprised of multiple views, due to each image containing several objects (e.g. a pedestrian, bicycle and tree) and multiple visual features (e.g. color, texture and shape). Currently available tools tend to use either labels or features for classification, but both are necessary to describe the image properly. There have been recent successes in using vector-valued functions, which construct matrix-valued kernels, to explore the multi-label structure in the output space. This has motivated us to develop multi-view vector-valued manifold regularization (MV$^3$MR) in order to integrate multiple features. MV$^3$MR exploits the complementary properties of different features, and discovers the intrinsic local geometry of the compact support shared by different features, under the theme of manifold regularization. We validate the effectiveness of the proposed MV$^3$MR methodology for image classification by conducting extensive experiments on two challenge datasets, PASCAL VOC' 07 and MIR Flickr.
Gradient Networks: Explicit Shape Matching Without Extracting Edges
Hsiao, Edward (Carnegie Mellon University) | Hebert, Martial (Carnegie Mellon University)
We present a novel framework for shape-based template matching in images. While previous approaches required brittle contour extraction, considered only local information, or used coarse statistics, we propose to match the shape explicitly on low-level gradients by formulating the problem as traversing paths in a gradient network. We evaluate our algorithm on a challenging dataset of objects in cluttered environments and demonstrate significant improvement over state-of-the-art methods for shape matching and object detection.
A Cyclic Weighted Median Method for L1 Low-Rank Matrix Factorization with Missing Entries
Meng, Deyu (Xi'an Jiaotong University) | Xu, Zongben (Xi'an Jiaotong University) | Zhang, Lei (The Hong Kong Polytechnic University) | Zhao, Ji (Carnegie Mellon University)
A challenging problem in machine learning, information retrieval and computer vision research is how to recover a low-rank representation of the given data in the presence of outliers and missing entries. The L1-norm low-rank matrix factorization (LRMF) has been a popular approach to solving this problem. However, L1-norm LRMF is difficult to achieve due to its non-convexity and non-smoothness, and existing methods are often inefficient and fail to converge to a desired solution. In this paper we propose a novel cyclic weighted median (CWM) method, which is intrinsically a coordinate decent algorithm, for L1-norm LRMF. The CWM method minimizes the objective by solving a sequence of scalar minimization sub-problems, each of which is convex and can be easily solved by the weighted median filter. The extensive experimental results validate that the CWM method outperforms state-of-the-arts in terms of both accuracy and computational efficiency.
Supervised and Projected Sparse Coding for Image Classification
Huang, Jin (University of Texas at Arlington) | Nie, Feiping (University of Texas at Arlington) | Huang, Heng (University of Texas at Arlington) | Ding, Chris (University of Texas at Arlington)
Classic sparse representation for classification (SRC) method fails to incorporate the label information of training images, and meanwhile has a poor scalability due to the expensive computation for l_1 norm. In this paper, we propose a novel subspace sparse coding method with utilizing label information to effectively classify the images in the subspace. Our new approach unifies the tasks of dimension reduction and supervised sparse vector learning, by simultaneously preserving the data sparse structure and meanwhile seeking the optimal projection direction in the training stage, therefore accelerates the classification process in the test stage. Our method achieves both flat and structured sparsity for the vector representations, therefore making our framework more discriminative during the subspace learning and subsequent classification. The empirical results on 4 benchmark data sets demonstrate the effectiveness of our method.
Vesselness Features and the Inverse Compositional AAM for Robust Face Recognition Using Thermal IR
Ghiass, Reza Shija (Laval University) | Arandjelovic, Ognjen (Deakin University) | Bendada, Hakim (Laval University) | Maldague, Xavier (Laval University)
Over the course of the last decade, infrared (IR) and particularly thermal IR imaging based face recognition has emerged as a promising complement to conventional, visible spectrum based approaches which continue to struggle when applied in the real world. While inherently insensitive to visible spectrum illumination changes, IR images introduce specific challenges of their own, most notably sensitivity to factors which affect facial heat emission patterns, e.g. emotional state, ambient temperature, and alcohol intake. In addition, facial expression and pose changes are more difficult to correct in IR images because they are less rich in high frequency detail which is an important cue for fitting any deformable model. In this paper we describe a novel method which addresses these major challenges. Specifically, to normalize for pose and facial expression changes we generate a synthetic frontal image of a face in a canonical, neutral facial expression from an image of the face in an arbitrary pose and facial expression. This is achieved by piecewise affine warping which follows active appearance model (AAM) fitting. This is the first publication which explores the use of an AAM on thermal IR images; we propose a pre-processing step which enhances detail in thermal images, making AAM convergence faster and more accurate. To overcome the problem of thermal IR image sensitivity to the exact pattern of facial temperature emissions we describe a representation based on reliable anatomical features. In contrast to previous approaches, our representation is not binary; rather, our method accounts for the reliability of the extracted features. This makes the proposed representation much more robust both to pose and scale changes. The effectiveness of the proposed approach is demonstrated on the largest public database of thermal IR images of faces on which it achieved 100% identification rate, significantly outperforming previously described methods.