Asia
Optimal Bayesian Hashing for Efficient Face Recognition
Dai, Qi (Fudan University) | Li, Jianguo (Intel Corporation) | Wang, Jun (Alibaba Group) | Chen, Yurong (Intel Corporation) | Jiang, Yu-Gang (Fudan University)
In practical applications, it is often observed that high-dimensional features can yield good performance, while being more costly in both computation and storage. In this paper, we propose a novel method called Bayesian Hashing to learn an optimal Hamming embedding of high-dimensional features, with a focus on the challenging application of face recognition. In particular, a boosted random FERNs classification model is designed to perform efficient face recognition, in which bit correlations are elaborately approximated with a random permutation technique. Without incurring additional storage cost, multiple random permutations are then employed to train a series of classifiers for achieving better discrimination power. In addition, we introduce a sequential forward floating search (SFFS) algorithm to perform model selection, resulting in further performance improvement. Extensive experimental evaluations and comparative studies clearly demonstrate that the proposed Bayesian Hashing approach outperforms other peer methods in both accuracy and speed. We achieve state-of-the-art results on well-known face recognition benchmarks using compact binary codes with significantly reduced computational overload and storage cost.
Robust Learning for Repeated Stochastic Games via Meta-Gaming
Crandall, Jacob W. (Masdar Institute of Science and Technology)
In repeated stochastic games (RSGs), an agent must quickly adapt to the behavior of previously unknown associates, who may themselves be learning. This machine-learning problem is particularly challenging due, in part, to the presence of multiple (even infinite) equilibria and inherently large strategy spaces. In this paper, we introduce a method to reduce the strategy space of two-player general-sum RSGs to a handful of expert strategies. This process, called mega, effectually reduces an RSG to a bandit problem. We show that the resulting strategy space preserves several important properties of the original RSG, thus enabling a learner to produce robust strategies within a reasonably small number of interactions. To better establish strengths and weaknesses of this approach, we empirically evaluate the resulting learning system against other algorithms in three different RSGs.
Efficient Generalized Conditional Gradient with Gradient Sliding for Composite Optimization
Cheung, Yiu-ming (Hong Kong Baptist University) | Lou, Jian (Hong Kong Baptist University)
For particular sparse optimization problems, among them is the popular tasks, its low computation cost of linear subproblem proximal gradient (PG) based approach ([Beck and Teboulle, evaluation on each iteration leads to superior 2009]; [Nesterov, 2013]). This kind of methods can achieve practical performance. However, the inferior iteration the optimal rate of convergence under certain problem settings, complexity incurs excess number of gradient hence it enjoys low iteration complexity. The periteration evaluations, which can counteract the efficiency cost mainly comes from gradient evaluation and a gained by solving low cost linear subproblem. In proximal map (PM) related to the type of the regularizer. On this paper, we therefore propose a novel algorithm the one hand, due to the optimal iteration complexity, the that requires optimal graduate evaluations as proximal number of gradient evaluations is optimal for PG method.
Mirror Representation for Modeling View-Specific Transform in Person Re-Identification
Chen, Ying-Cong (Sun Yat-sen University) | Zheng, Wei-Shi (Sun Yat-sen University) | Lai, Jianhuang (Sun Yat-sen University)
Person re-identification concerns the matching of pedestrians across disjoint camera views. Due to the changes of viewpoints, lighting conditions and camera features, images of the same person from different views always appear differently, and thus feature representations across disjoint camera views of the same person follow different distributions. In this work, we propose an effective, low cost and easy-to-apply schema called the Mirror Representation, which embeds the view-specific feature transformation and enables alignment of the feature distributions across disjoint views for the same person. The proposed Mirror Representation is also designed to explicitly model the relation between different view-specific transformations and meanwhile control their discrepancy. With our Mirror Representation, we can enhance existing subspace/metric learning models significantly, and we particularly show that kernel marginal fisher analysis significantly outperforms the current state-of-the-art methods through extensive experiments on VIPeR, PRID450S and CUHK01.
Training-Efficient Feature Map for Shift-Invariant Kernels
Chen, Xixian (The Chinese University of Hong Kong) | Yang, Haiqin (The Chinese University of Hong Kong) | King, Irwin (The Chinese University of Hong Kong) | Lyu, Michael R. (The Chinese University of Hong Kong)
Random feature map is popularly used to scale up kernel methods. However, employing a large number of mapped features to ensure an accurate approximation will still make the training time consuming. In this paper, we aim to improve the training efficiency of shift-invariant kernels by using fewer informative features without sacrificing precision. We propose a novel feature map method by extending Random Kitchen Sinks through fast data-dependent subspace embedding to generate the desired features. More specifically, we describe two algorithms with different tradeoffs on the running speed and accuracy, and prove that O(l) features induced by them are able to perform as accurately as O(l 2 ) features by other feature map methods. In addition, several experiments are conducted on the real-world datasets demonstrating the superiority of our proposed algorithms.
Model Metric Co-Learning for Time Series Classification
Chen, Huanhuan (University of Science and Technology of China) | Tang, Fengzhen (University of Birmingham) | Tino, Peter (University of Birmingham) | Cohn, Anthony G. (University of Leeds) | Yao, Xin (University of Birmingham)
We present a novel model-metric co-learning (MMCL) methodology for sequence classification which learns in the model space -- each data item (sequence) is represented by a predictive model from a carefully designed model class. MMCL learning encourages sequences from the same class to be represented by โcloseโ model representations, well separated from those for different classes. Existing approaches to the problem either fit a single model to all the data, or a (predominantly linear) model on each sequence. We introduce a novel hybrid approach spanning the two extremes. The model class we use is a special form of adaptive high-dimensional non-linear state space model with a highly constrained and simple dynamic part. The dynamic part is identical for all data items and acts as a temporal filter providing a rich pool of dynamic features that can be selectively extracted by individual (static) linear readout mappings representing the sequences. Alongside learning the dynamic part, we also learn the global metric in the model readout space. Experiments on synthetic and benchmark data sets confirm the effectiveness of the algorithm compared to a variety of alternative methods.
Compressed Spectral Regression for Efficient Nonlinear Dimensionality Reduction
Cai, Deng (Zhejiang University)
Spectral dimensionality reduction methods have recently emerged as powerful tools for various applications in pattern recognition, data mining and computer vision. These methods use information contained in the eigenvectors of a data affinity (i.e, item-item similarity) matrix to reveal the low dimensional structure of the high dimensional data. One of the limitations of various spectral dimensionality reduction methods is their high computational complexity. They all need to construct a data affinity matrix and compute the top eigenvectors. This leads to O(n2) computational complexity, where n is the number of samples. Moreover, when the data are highly non-linear distributed, some linear methods have to be performed in a reproducing kernel Hilbert space (leads to the corresponding kernel methods) to learn an effective non-linear mapping. The computational complexity of these kernel methods is O(n3). In this paper, we propose a novel nonlinear dimensionality reduction algorithm, called Compressed Spectral Regression, with O(n) computational complexity. Extensive experiments on data clustering demonstrate the effectiveness and efficiency of the proposed approach.
Count-Based Frequency Estimation with Bounded Memory
Bellemare, Marc G. (Google DeepMind)
Count-based estimators are a fundamental building block of a number of powerful sequential prediction algorithms, including Context Tree Weighting and Prediction by Partial Matching. Keeping exact counts, however, typically results in a high memory overhead. In particular, when dealing with large alphabets the memory requirements of count-based estimators often become prohibitive. In this paper we propose three novel ideas for approximating count-based estimators using bounded memory. Our first contribution, of independent interest, is an extension of reservoir sampling for sampling distinct symbols from a stream of unknown length, which we call K-distinct reservoir sampling. We combine this sampling scheme with a state-of-the-art count-based estimator for memoryless sources, the Sparse Adaptive Dirichlet (SAD) estimator. The resulting algorithm, the Budget SAD, naturally guarantees a limit on its memory usage. We finally demonstrate the broader use of K-distinct reservoir sampling in nonparametric estimation by using it to restrict the branching factor of the Context Tree Weighting algorithm. We demonstrate the usefulness of our algorithms with empirical results on two sequential, large-alphabet prediction problems.
A Graph Kernel Based on the Jensen-Shannon Representation Alignment
Bai, Lu (Central University of Finance and Economics and University of York) | Zhang, Zhihong (Xiamen University) | Wang, Chaoyan (University of ย Nottingham) | Bai, Xiao (Beihang University) | Hancock, Edwin (University of York)
In this paper, we develop a novel graph kernel by aligning the Jensen-Shannon (JS) representations of vertices. We commence by describing how to compute the JS representation of a vertex by measuring the JS divergence (JSD) between the corresponding $-layer depth-based (DB) representations developed. By aligning JS representations of vertices, we identify the correspondence between the vertices of two graphs and this allows us to construct a matching-based graph kernel. Unlike existing R-convolution kernels that roughly record the isomorphism information between any pair of substructures under a type of graph decomposition, the new kernel can be seen as an aligned subgraph kernel that incorporates explicit local correspondences of substructures i.e., the local information graphs) into the process of kernelization through the JS representation alignment. The new kernel thus addresses the drawback of neglecting the relative locations between substructures that arises in the R-convolution kernels. Experiments demonstrate that our kernel can easily outperform state-of-the-art graph kernels in terms of the classification accuracies.
Characterizing Causal Action Theories and Their Implementations in Answer Set Programming: Action Languages B, C, and Beyond
Zhang, Haodi (HK University of Science and Technology) | Lin, Fangzhen (HK University of Science and Technology)
We consider a simple language for writing causal action theories, and postulate several properties for the state transition models of these theories. We then consider some possible embeddings of these causal action theories in some other action formalisms, and their implementations in logic programs with answer set semantics. In particular, we propose to consider what we call permissible translations from these causal action theories to logic programs. We identify two sets of properties, and prove that for each set, there is only one permissible translation, under strong equivalence, that can satisfy all properties in the set. As it turns out, for one set, the unique permissible translation is essentially the same as Balduccini and Gelfond's translation from Gelfond and Lifschitz's action language B to logic programs. For the other, it is essentially the same as Lifschitz and Turner's translation from the action language C to logic programs. This work provides a new perspective on understanding, evaluating and comparing action languages by using sets of properties instead of examples. It will be interesting to see if other action languages can be similarly characterized, and whether new action formalisms can be defined using different sets of properties.