Asia
Scalable Graph Hashing with Feature Transformation
Jiang, Qing-Yuan (Nanjing University) | Li, Wu-Jun (Nanjing University)
Hashing has been widely used for approximate nearest neighbor (ANN) search in big data applications because of its low storage cost and fast retrieval speed. The goal of hashing is to map the data points from the original space into a binary-code space where the similarity (neighborhood structure) in the original space is preserved. By directly exploiting the similarity to guide the hashing code learning procedure, graph hashing has attracted much attention. However, most existing graph hashing methods cannot achieve satisfactory performance in real applications due to the high complexity for graph modeling. In this paper, we propose a novel method, called scalable graph hashing with feature transformation (SGH), for large-scale graph hashing. Through feature transformation, we can effectively approximate the whole graph without explicitly computing the similarity graph matrix, based on which a sequential learning method is proposed to learn the hash functions in a bit-wise manner. Experiments on two datasets with one million data points show that our SGH method can outperform the state-of-the-art methods in terms of both accuracy and scalability.
Semantic Single Video Segmentation with Robust Graph Representation
Zhao, Handong (Northeastern University) | Fu, Yun (Northeastern University)
Graph-based video segmentation has demonstrated its influential impact from recent works. However, most of the existing approaches fail to make a semantic segmentation of the foreground objects, i.e. all the segmented objects are treated as one class. In this paper, we propose an approach to semantically segment the multi-class foreground objects from a single video sequence. To achieve this, we firstly generate a set of proposals for each frame and score them based on motion and appearance features. With these scores, the similarities between each proposal are measured. To tackle the vulnerability of the graph-based model, low-rank representation with l21-norm regularizer outlier detection is proposed to discover the intrinsic structure among proposals. With the "clean" graph representation, objects of different classes are more likely to be grouped into separated clusters. Two open public datasets MOViCS and ObMiC are used for evaluation under both intersection-over-union and F-measure metrics. The superior results compared with the state-of-the-arts demonstrate the effectiveness of the proposed method.
Saliency Detection with a Deeper Investigation of Light Field
Zhang, Jun (Hefei University of Technology) | Wang, Meng (Hefei University of Technology) | Gao, Jun (Hefei University of Technology) | Wang, Yi (Hefei University of Technology) | Zhang, Xudong (Hefei University of Technology) | Wu, Xindong (Hefei University of Technology)
Although the light field has been recently recognized helpful in saliency detection, it is not comprehensively explored yet. In this work, we propose a new saliency detection model with light field data. The idea behind the proposed model originates from the following observations. (1) People can distinguish regions at different depth levels via adjusting the focus of eyes. Similarly, a light field image can generate a set of focal slices focusing at different depth levels, which suggests that a background can be weighted by selecting the corresponding slice. We show that background priors encoded by light field focusness have advantages in eliminating background distraction and enhancing the saliency by weighting the light field contrast. (2) Regions at closer depth ranges tend to be salient, while far in the distance mostly belong to the backgrounds. We show that foreground objects can be easily separated from similar or cluttered backgrounds by exploiting their light field depth. Extensive evaluations on the recently introduced Light Field Saliency Dataset (LFSD) [Li et al., 2014], including studies of different light field cues and comparisons with Li et al.'s method (the only reported light field saliency detection approach to our knowledge) and the 2D/3D state-of-the-art approaches extended with light field depth/focusness information, show that the investigated light field properties are complementary with each other and lead to improvements on 2D/3D models, and our approach produces superior results in comparison with the state-of-the-art.
Generalized Transitive Distance with Minimum Spanning Random Forest
Yu, Zhiding (Carnegie Mellon University) | Liu, Weiyang (Peking University) | Liu, Wenbo (Carnegie Mellon University) | Peng, Xi (Research Agency for Science, Technology and Research (A*STAR) Singapore) | Hui, Zhuo (Carnegie Mellon University) | Kumar, B. V. K. Vijaya (Carnegie Mellon University)
Transitive distance is an ultrametric with elegant properties for clustering. Conventional transitive distance can be found by referring to the minimum spanning tree (MST). We show that such distance metric can be generalized onto a minimum spanning random forest (MSRF) with element-wise max pooling over the set of transitive distance matrices from an MSRF. Our proposed approach is both intuitively reasonable and theoretically attractive. Intuitively, max pooling alleviates undesired short links with single MST when noise is present. Theoretically, one can see that the distance metric obtained max pooling is still an ultrametric, rendering many good clustering properties. Comprehensive experiments on data clustering and image segmentation show that MSRF with max pooling improves the clustering performance over single MST and achieves state of the art performance on the Berkeley Segmentation Dataset.
Trailer Generation via a Point Process-Based Visual Attractiveness Model
Xu, Hongteng (Georgia Institute of Technology) | Zhen, Yi (Georgia Institute of Technology) | Zha, Hongyuan (Georgia Institute of Technology)
Producing attractive trailers for videos needs human expertise and creativity, and hence is challenging and costly. Different from video summarization that focuses on capturing storylines or important scenes, trailer generation aims at producing trailers that are attractive so that viewers will be eager to watch the original video. In this work, we study the problem of automatic trailer generation, in which an attractive trailer is produced given a video and a piece of music. We propose a surrogate measure of video attractiveness named fixation variance, and learn a novel self-correcting point process-based attractiveness model that can effectively describe the dynamics of attractiveness of a video. Furthermore, based on the attractiveness model learned from existing training trailers, we propose an efficient graph-based trailer generation algorithm to produce a max-attractiveness trailer. Experiments demonstrate that our approach outperforms the state-of-the-art trailer generators in terms of both quality and efficiency.
Face Clustering in Videos with Proportion Prior
Tang, Zhiqiang (Chinese Academy of Sciences) | Zhang, Yifan (Chinese Academy of Sciences) | Li, Zechao (Nanjing University of Science and Technology) | Lu, Hanqing (Chinese Academy of Sciences)
In this paper, we investigate the problem of face clustering in real-world videos. In many cases, the distribution of the face data is unbalanced. In movies or TV series videos, the leading casts appear quite often and the others appear much less. However, many clustering algorithms cannot well handle such severe unbalance between the data distribution, resulting in that the large class is split apart, and the small class is merged into the large ones and thus missing. On the other hand, the data distribution proportion information may be known beforehand. For example, we can obtain such information by counting the spoken lines of the characters in the script text. Hence, we propose to make use of the proportion prior to regularize the clustering. A Hidden Conditional Random Field(HCRF) model is presented to incorporate the proportion prior. In experiments on a public data set from real-world videos, we observe improvements on clustering performance against state-of-the-art methods.
Adaptive Sharing for Image Classification
Shen, Li (University of Chinese Academy of Sciences) | Sun, Gang (University of Chinese Academy of Sciences) | Lin, Zhouchen (Peking University) | Huang, Qingming (University of Chinese Academy of Sciences and Chinese Academy of Sciences) | Wu, Enhua (Chinese Academy of Sciences and University of Macau)
In this paper, we formulate the image classification problem in a multi-task learning framework. We propose a novel method to adaptively share information among tasks (classes). Different from imposing strong assumptions or discovering specific structures, the key insight in our method is to selectively extract and exploit the shared information among classes while capturing respective disparities simultaneously. It is achieved by estimating a composite of two sets of parameters with different regularization. Besides applying it for learning classifiers on pre-computed features, we also integrate the adaptive sharing with deep neural networks, whose discriminative power can be augmented by encoding class relationship. We further develop two strategies for solving the optimization problems in the two scenarios. Empirical results demonstrate that our method can significantly improve the classification performance by transferring knowledge appropriately.
Salient Object Detection via Augmented Hypotheses
Nguyen, Tam Van (Singapore Polytechnic) | Sepulveda, Jose (Singapore Polytechnic)
In this paper, we propose using augmented hypotheses which consider objectness, foreground and compactness for salient object detection. Our algorithm consists of four basic steps. First, our method generates the objectness map via objectness hypotheses. Based on the objectness map, we estimate the foreground margin and compute the corresponding foreground map which prefers the foreground objects. From the objectness map and the foreground map, the compactness map is formed to favor the compact objects. We then derive a saliency measure that produces a pixelaccurate saliency map which uniformly covers the objects of interest and consistently separates foreand background. We finally evaluate the proposed framework on two challenging datasets, MSRA-1000 and iCoSeg. Our extensive experimental results Figure 1: From top to bottom: original images, the objectness show that our method outperforms state-ofthe-art hypotheses, results of our saliency computation, and ground approaches.
Social Image Parsing by Cross-Modal Data Refinement
Lu, Zhiwu (Renmin University of China) | Gao, Xin (KAUST) | Huang, Songfang (IBM China Research Lab) | Wang, Liwei (Peking University) | Wen, Ji-Rong (Renmin University of China)
This paper presents a cross-modal data refinement algorithm for social image parsing, or segmenting all the objects within a social image and then identifying their categories. Different from the traditional fully supervised image parsing that takes pixel-level labels as strong supervisory information, our social image parsing is initially provided with the noisy tags of images (i.e. image-level labels), which are shared by social users. By oversegmenting each image into multiple regions, we formulate social image parsing as a cross-modal data refinement problem over a large set of regions, where the initial labels of each region are inferred from image-level labels. Furthermore, we develop an efficient algorithm to solve such cross-modal data refinement problem. The experimental results on several benchmark datasets show the effectiveness of our algorithm. More notably, our algorithm can be considered to provide an alternative and natural way to address the challenging problem of image parsing, since image-level labels are much easier to access than pixel-level labels.
Inferring Painting Style with Multi-Task Dictionary Learning
Liu, Gaowen (University of Trento) | Yan, Yan (University of Trento and ADSC) | Ricci, Elisa (Fondazione Bruno Kessler) | Yang, Yi (University of Technology Sydney) | Han, Yahong (Tianjin University) | Winkler, Stefan (ADSC, UIUC) | Sebe, Nicu (University of Trento)
Recent advances in imaging and multimedia technologies have paved the way for automatic analysis of visual art. Despite notable attempts, extracting relevant patterns from paintings is still a challenging task. Different painters, born in different periods and places, have been influenced by different schools of arts. However, each individual artist also has a unique signature, which is hard to detect with algorithms and objective features. In this paper we propose a novel dictionary learning approach to automatically uncover the artistic style from paintings. Specifically, we present a multi-task learning algorithm to learn a style-specific dictionary representation. Intuitively, our approach, by automatically decoupling style-specific and artist-specific patterns, is expected to be more accurate for retrieval and recognition tasks than generic methods. To demonstrate the effectiveness of our approach, we introduce the DART dataset, containing more than 1.5K images of paintings representative of different styles. Our extensive experimental evaluation shows that our approach significantly outperforms state-of-the-art methods.