Institute of Automation, Chinese Academy of Sciences
A Dynamic Window Neural Network for CCG Supertagging
Wu, Huijia (Institute of Automation, Chinese Academy of Sciences) | Zhang, Jiajun (Institute of Automation, Chinese Academy of Sciences) | Zong, Chengqing (Institute of Automation, Chinese Academy of Sciences)
Combinatory Category Grammar (CCG) supertagging is a task to assign lexical categories to each word in a sentence. Almost all previous methods use fixed context window sizes to encode input tokens. However, it is obvious that different tags usually rely on different context window sizes. This motivates us to build a supertagger with a dynamic window approach, which can be treated as an attention mechanism on the local contexts. We find that applying dropout on the dynamic filters is superior to the regular dropout on word embeddings. We use this approach to demonstrate the state-of-the-art CCG supertagging performance on the standard test set.
SAPE: A System for Situation-Aware Public Security Evaluation
Wu, Shu (Institute of Automation, Chinese Academy of Sciences) | Liu, Qiang (Institute of Automation, Chinese Academy of Sciences) | Bai, Ping (Institute of Automation, Chinese Academy of Sciences) | Wang, Liang (Institute of Automation, Chinese Academy of Sciences) | Tan, Tieniu (Institute of Automation, Chinese Academy of Sciences)
Public security events are occurring all over the world, bringing threat to personal and property safety, and homeland security. It is vital to construct an effective model to evaluate and predict the public security. In this work, we establish a Situation-Aware Public Security Evaluation (SAPE) platform. Based on conventional Recurrent Neural Networks (RNN), we develop a new variant of RNN to handle temporal contexts in public security event datasets. The proposed model can achieve better performance than the compared state-of-the-art methods. On SAPE, There are two parts of demonstrations, i.e., global public security evaluation and China public security evaluation. In the global part, based on Global Terrorism Database from UMD, for each country, SAPE can predict risk level and top-n potential terrorist organizations which might attack the country. The users can also view the actual attacking organizations and predicted results. For each province in China, SAPE can predict the risk level and the probability scores of different types of events in the next month. The users can also view the actual numbers of events and predicted risk levels of the past one year.
Shoot to Know What: An Application of Deep Networks on Mobile Devices
Wu, Jiaxiang (Institute of Automation, Chinese Academy of Sciences) | Hu, Qinghao (Institute of Automation, Chinese Academy of Sciences) | Leng, Cong (Institute of Automation, Chinese Academy of Sciences) | Cheng, Jian (Institute of Automation, Chinese Academy of Sciences)
Convolutional neural networks (CNNs) have achieved impressive performance in a wide range of computer vision areas. However, the application on mobile devices remains intractable due to the high computation complexity. In this demo, we propose the Quantized CNN (Q-CNN), an efficient framework for CNN models, to fulfill efficient and accurate image classification on mobile devices. Our Q-CNN framework dramatically accelerates the computation and reduces the storage/memory consumption, so that mobile devices can independently run an ImageNet-scale CNN model. Experiments on the ILSVRC-12 dataset demonstrate 4~6x speed-up and 15~20x compression, with merely one percentage drop in the classification accuracy. Based on the Q-CNN framework, even mobile devices can accurately classify images within one second.
Metric Embedded Discriminative Vocabulary Learning for High-Level Person Representation
Yang, Yang (Institute of Automation, Chinese Academy of Sciences) | Lei, Zhen (Institute of Automation, Chinese Academy of Sciences) | Zhang, Shifeng (Institute of Automation, Chinese Academy of Sciences) | Shi, Hailin (Institute of Automation, Chinese Academy of Sciences) | Li, Stan Z. (Institute of Automation, Chinese Academy of Sciences)
A variety of encoding methods for bag of word (BoW) model have been proposed to encode the local features in image classification. However, most of them are unsupervised and just employ k-means to form the visual vocabulary, thus reducing the discriminative power of the features. In this paper, we propose a metric embedded discriminative vocabulary learning for high-level person representation with application to person re-identification. A new and effective term is introduced which aims at making the same persons closer while different ones farther in the metric space. With the learned vocabulary, we utilize a linear coding method to encode the image-level features (or holistic image features) for extracting high-level person representation. Different from traditional unsupervised approaches, our method can explore the relationship(same or not) among the persons. Since there is an analytic solution to the linear coding, it is easy to obtain the final high-level features. The experimental results on person re-identification demonstrate the effectiveness of our proposed algorithm.
Large-Scale Graph-Based Semi-Supervised Learning via Tree Laplacian Solver
Zhang, Yan-Ming (Institute of Automation, Chinese Academy of Sciences) | Zhang, Xu-Yao (Institute of Automation, Chinese Academy of Sciences) | Yuan, Xiao-Tong (Nanjing University of Information Science and Technology) | Liu, Cheng-Lin (Institute of Automation, Chinese Academy of Sciences)
Graph-based Semi-Supervised learning is one of the most popular and successful semi-supervised learning methods. Typically, it predicts the labels of unlabeled data by minimizing a quadratic objective induced by the graph, which is unfortunately a procedure of polynomial complexity in the sample size $n$. In this paper, we address this scalability issue by proposing a method that approximately solves the quadratic objective in nearly linear time. The method consists of two steps: it first approximates a graph by a minimum spanning tree, and then solves the tree-induced quadratic objective function in O(n) time which is the main contribution of this work. Extensive experiments show the significant scalability improvement over existing scalable semi-supervised learning methods.
Co-Occurrence Feature Learning for Skeleton Based Action Recognition Using Regularized Deep LSTM Networks
Zhu, Wentao (University of California, Irvine) | Lan, Cuiling (Microsoft Research Asia) | Xing, Junliang (Institute of Automation, Chinese Academy of Sciences) | Zeng, Wenjun (Microsoft Research Asia) | Li, Yanghao (Peking University) | Shen, Li (University of Chinese Academy of Sciences) | Xie, Xiaohui (University of California, Irvine)
Skeleton based action recognition distinguishes human actions using the trajectories of skeleton joints, which provide a very good representation for describing actions. Considering that recurrent neural networks (RNNs) with Long Short-Term Memory (LSTM) can learn feature representations and model long-term temporal dependencies automatically, we propose an end-to-end fully connected deep LSTM network for skeleton based action recognition. Inspired by the observation that the co-occurrences of the joints intrinsically characterize human actions, we take the skeleton as the input at each time slot and introduce a novel regularization scheme to learn the co-occurrence features of skeleton joints. To train the deep LSTM network effectively, we propose a new dropout algorithm which simultaneously operates on the gates, cells, and output responses of the LSTM neurons. Experimental results on three human action recognition datasets consistently demonstrate the effectiveness of the proposed model.
Large Scale Similarity Learning Using Similar Pairs for Person Verification
Yang, Yang (Institute of Automation, Chinese Academy of Sciences) | Liao, Shengcai (Institute of Automation, Chinese Academy of Sciences) | Lei, Zhen (Institute of Automation, Chinese Academy of Sciences) | Li, Stan Z. (Institute of Automation, Chinese Academy of Sciences)
In this paper, we propose a novel similarity measure and then introduce an efficient strategy to learn it by using only similar pairs for person verification. Unlike existing metric learning methods, we consider both the difference and commonness of an image pair to increase its discriminativeness. Under a pairconstrained Gaussian assumption, we show how to obtain the Gaussian priors (i.e., corresponding covariance matrices) of dissimilar pairs from those of similar pairs. The application of a log likelihood ratio makes the learning process simple and fast and thus scalable to large datasets. Additionally, our method is able to handle heterogeneous data well. Results on the challenging datasets of face verification (LFW and Pub-Fig) and person re-identification (VIPeR) show that our algorithm outperforms the state-of-the-art methods.
Predicting the Next Location: A Recurrent Model with Spatial and Temporal Contexts
Liu, Qiang (Institute of Automation, Chinese Academy of Sciences) | Wu, Shu ( Institute of Automation, Chinese Academy of Sciences ) | Wang, Liang ( Institute of Automation, Chinese Academy of Sciences ) | Tan, Tieniu ( Institute of Automation, Chinese Academy of Sciences )
Spatial and temporal contextual information plays a key role for analyzing user behaviors, and is helpful for predicting where he or she will go next. With the growing ability of collecting information, more and more temporal and spatial contextual information is collected in systems, and the location prediction problem becomes crucial and feasible. Some works have been proposed to address this problem, but they all have their limitations. Factorizing Personalized Markov Chain (FPMC) is constructed based on a strong independence assumption among different factors, which limits its performance. Tensor Factorization (TF) faces the cold start problem in predicting future actions. Recurrent Neural Networks (RNN) model shows promising performance comparing with PFMC and TF, but all these methods have problem in modeling continuous time interval and geographical distance. In this paper, we extend RNN and propose a novel method called Spatial Temporal Recurrent Neural Networks (ST-RNN). ST-RNN can model local temporal and spatial contexts in each layer with time-specific transition matrices for different time intervals and distance-specific transition matrices for different geographical distances. Experimental results show that the proposed ST-RNN model yields significant improvements over the competitive compared methods on two typical datasets, i.e., Global Terrorism Database (GTD) and Gowalla dataset.
Knowledge Graph Completion with Adaptive Sparse Transfer Matrix
Ji, Guoliang (Institute of Automation, Chinese Academy of Sciences) | Liu, Kang (Institute of Automation, Chinese Academy of Sciences) | He, Shizhu (Institute of Automation, Chinese Academy of Sciences) | Zhao, Jun (Institute of Automation, Chinese Academy of Sciences)
We model knowledge graphs for their completion by encoding each entity and relation into a numerical space. All previous work including Trans(E, H, R, and D) ignore the heterogeneity (some relations link many entity pairs and others do not) and the imbalance (the number of head entities and that of tail entities in a relation could be different) of knowledge graphs. In this paper, we propose a novel approach TranSparse to deal with the two issues. In TranSparse, transfer matrices are replaced by adaptive sparse matrices, whose sparse degrees are determined by the number of entities (or entity pairs) linked by relations. In experiments, we design structured and unstructured sparse patterns for transfer matrices and analyze their advantages and disadvantages. We evaluate our approach on triplet classification and link prediction tasks. Experimental results show that TranSparse outperforms Trans(E, H, R, and D) significantly, and achieves state-of-the-art performance.
MC-HOG Correlation Tracking with Saliency Proposal
Zhu, Guibo (Institute of Automation, Chinese Academy of Sciences) | Wang, Jinqiao (Institute of Automation, Chinese Academy of Sciences) | Wu, Yi (Nanjing University of Information Science and Technology) | Zhang, Xiaoyu (Chinese Academy of Sciences) | Lu, Hanqing (Institute of Automation, Chinese Academy of Sciences)
Designing effective feature and handling the model drift problem are two important aspects for online visual tracking. For feature representation, gradient and color features are most widely used, but how to effectively combine them for visual tracking is still an open problem. In this paper, we propose a rich feature descriptor, MC-HOG, by leveraging rich gradient information across multiple color channels or spaces. Then MC-HOG features are embedded into the correlation tracking framework to estimate the state of the target. For handling the model drift problem caused by occlusion or distracter, we propose saliency proposals as prior information to provide candidates and reduce background interference. In addition to saliency proposals, a ranking strategy is proposed to determine the importance of these proposals by exploiting the learnt appearance filter, historical preserved object samples and the distracting proposals. In this way, the proposed approach could effectively explore the color-gradient characteristics and alleviate the model drift problem. Extensive evaluations performed on the benchmark dataset show the superiority of the proposed method.