Asia
Multi-Label Structure Learning with Ising Model Selection
Goncalves, Andre R. (University of Campinas) | Zuben, Fernando J. Von (University of Campinas) | Banerjee, Arindam (University of Minnesota, Twin Cities)
A common way of attacking multi-label classification problems is by splitting it into a set of binary classification problems, then solving each problem independently using traditional single-label methods. Nevertheless, by learning classifiers separately the information about the relationship between labels tends to be neglected. Built on recent advances in structure learning in Ising Markov Random Fields (I-MRF), we propose a multi-label classification algorithm that explicitly estimate and incorporate label dependence into the classifiers learning process by means of a sparse convex multi-task learning formulation.Extensive experiments considering several existing multi-label algorithms indicate that the proposed method, while conceptually simple, outperforms the contenders in several datasets and performance metrics. Besides that, the conditional dependence graph encoded in the I-MRF provides a useful information that can be used in a posterior investigation regarding the reasons behind the relationship between labels.
Pre-release Prediction of Crowd Opinion on Movies by Label Distribution Learning
Geng, Xin (Southeast University) | Hou, Peng (Southeast University)
This paper studies an interesting problem: is it possible to predict the crowd opinion about a movie before the movie is actually released? The crowd opinion is here expressed by the distribution of ratings given by a sufficient amount of people. Consequently, the pre-release crowd opinion prediction can be regarded as a Label Distribution Learning (LDL) problem. In order to solve this problem, a Label Distribution Support Vector Regressor (LDSVR) is proposed in this paper. The basic idea of LDSVR is to fit a sigmoid function to each component of the label distribution simultaneously by a multi-output support vector machine. Experimental results show that LDSVR can accurately predict peoplesโs rating distribution about a movie just based on the pre-release metadata of the movie.
Quiet: Faster Belief Propagation for Images and Related Applications
Fujiwara, Yasuhiro (Nippon Telegraph and Telephone Corporation) | Shasha, Dennis (New York University)
Belief propagation over Markov random fields has been successfully used in many AI applications since it yields accurate inference results by iteratively updating messages between nodes. However, its high computation costs are a barrier to practical use. This paper presents an efficient approach to belief propagation. Our approach, Quiet, dynamically detects converged messages to skip unnecessary updates in each iteration while it theoretically guarantees to output the same results as the standard approach used to implement belief propagation. Experiments show that our approach is significantly faster than existing approaches without sacrificing inference quality.
Random Feature Mapping with Signed Circulant Matrix Projection
Feng, Chang (Tianjin University) | Hu, Qinghua (Tianjin University) | Liao, Shizhong (Tianjin University)
Random feature mappings have been successfully used for approximating non-linear kernels to scaleup kernel methods. Some work aims at speeding up the feature mappings, but brings increasing variance of the approximation. In this paper, we propose a novel random feature mapping method that uses a signed Circulant Random Matrix (CRM) instead of an unstructured random matrix to project input data. The signed CRM has linear space complexity as the whole signed CRM can be recovered from one column of the CRM, and ensures loglin-ear time complexity to compute the feature mapping using the Fast Fourier Transform (FFT). Theoretically, we prove that approximating Gaussian kernel using our mapping method is unbiased and does not increase the variance. Experimentally, we demonstrate that our proposed mapping method is time and space efficient while retaining similar accuracies with state-of-the-art random feature mapping methods. Our proposed random feature mapping method can be implemented easily and make kernel methods scalable and practical for large scale training and predicting problems.
Crowdsourced Semantic Matching of Multi-Label Annotations
Duan, Lei (Hokkaido University) | Oyama, Satoshi (Hokkaido University) | Kurihara, Masahito (Hokkaido University) | Sato, Haruhiko (Hokkaido University)
Most multi-label domains lack an authoritative taxonomy. Therefore, different taxonomies are commonly used in the same domain, which results in complications. Although this situation occurs frequently, there has been little study of it using a principled statistical approach. Given that (1) different taxonomies used in the same domain are generally founded on the same latent semantic space, where each possible label set in a taxonomy denotes a single semantic concept, and that (2) crowdsourcing is beneficial in identifying relationships between semantic concepts and instances at low cost, we proposed a novel probabilistic cascaded method for establishing a semantic matching function in a crowdsourcing setting that maps label sets in one (source) taxonomy to label sets in another (target) taxonomy in terms of the semantic distances between them. The established function can be used to detect the associated label set in the target taxonomy for an instance directly from its associated label set in the source taxonomy without any extra effort. Experimental results on real-world data (emotion annotations for narrative sentences) demonstrated that the proposed method can robustly establish semantic matching functions exhibiting satisfactory performance from a limited number of crowdsourced annotations.
Robust Multiple Kernel K-means Using L21-Norm
Du, Liang (Chinese Academy of Sciences and Shanxi University) | Zhou, Peng (Chinese Academy of Sciences) | Shi, Lei (Chinese Academy of Sciences) | Wang, Hanmo (Chinese Academy of Sciences) | Fan, Mingyu (Wenzhou University) | Wang, Wenjian (Shanxi University) | Shen, Yi-Dong (Chinese Academy of Sciences)
The k-means algorithm is one of the most often used method for data clustering. However, the standard k-means can only be applied in the original feature space. The kernel k-means, which extends k-means into the kernel space, can be used to capture the non-linear structure and identify arbitrarily shaped clusters. Since both the standard k-means and kernel k-means apply the squared error to measure the distances between data points and cluster centers, a few outliers will cause large errors and dominate the objection function. Besides, the performance of kernel method is largely determined by the choice of kernel. Unfortunately, the most suitable kernel for a particular task is often unknown in advance. In this paper, we first present a robust k-means using l 2,1 -norm in the feature space and then extend it to the kernel space. To recap the powerfulness of kernel methods, we further propose a novel robust multiple kernel k-means (RMKKM) algorithm that simultaneously finds the best clustering label, the cluster membership and the optimal combination of multiple kernels. An alternating iterative schema is developed to find the optimal value. Extensive experiments well demonstrate the effectiveness of the proposed algorithms.
Topic Modeling with Document Relative Similarities
Du, Jianguang (Beijing Institute of Technology) | Jiang, Jing (Singapore Management University) | Song, Dandan (Beijing Institute of Technology) | Liao, Lejian (Beijing Institute of Technology)
Topic modeling has been widely used in text mining. Previous topic models such as Latent Dirichlet Allocation (LDA) are successful in learning hidden topics but they do not take into account metadata of documents. To tackle this problem, many augmented topic models have been proposed to jointly model text and metadata. But most existing models handle only categorical and numerical types of metadata. We identify another type of metadata that can be more natural to obtain in some scenarios. These are relative similarities among documents. In this paper, we propose a general model that links LDA with constraints derived from document relative similarities. Specifically, in our model, the constraints act as a regularizer of the log likelihood of LDA. We fit the proposed model using Gibbs-EM. Experiments with two real world datasets show that our model is able to learn meaningful topics. The results also show that our model outperforms the baselines in terms of topic coherence and a document classification task.
Multi-View Matrix Decomposition: A New Scheme for Exploring Discriminative Information
Deng, Cheng (Xidian University) | Lv, Zongting (Xidian University) | Liu, Wei (IBM T. J. Watson Research Center) | Huang, Junzhou (University of Texas at Arlington) | Tao, Dacheng (University of Technology, Sydney) | Gao, Xinbo (Xidian University)
Recent studies have demonstrated the advantages of fusing information from multiple views for various machine learning applications. However, most existing approaches assumed the shared component common to all views and ignored the private components of individual views, which thereby restricts the learning performance. In this paper, we propose a new multi-view, low-rank, and sparse matrix decomposition scheme to seamlessly integrate diverse yet complementary information stemming from multiple views. Unlike previous approaches, our approach decomposes an input data matrix concatenated from multiple views as the sum of low-rank, sparse, and noisy parts. Then a unified optimization framework is established, where the low-rankness and group-structured sparsity constraints are imposed to simultaneously capture the shared and private components in both instance and view levels. A proven optimization algorithm is developed to solve the optimization, yielding the learned augmented representation which is used as features for classification tasks. Extensive experiments conducted on six benchmark image datasets show that our approach enjoys superior performance over the state-of-the-art approaches.
Optimal Bayesian Hashing for Efficient Face Recognition
Dai, Qi (Fudan University) | Li, Jianguo (Intel Corporation) | Wang, Jun (Alibaba Group) | Chen, Yurong (Intel Corporation) | Jiang, Yu-Gang (Fudan University)
In practical applications, it is often observed that high-dimensional features can yield good performance, while being more costly in both computation and storage. In this paper, we propose a novel method called Bayesian Hashing to learn an optimal Hamming embedding of high-dimensional features, with a focus on the challenging application of face recognition. In particular, a boosted random FERNs classification model is designed to perform efficient face recognition, in which bit correlations are elaborately approximated with a random permutation technique. Without incurring additional storage cost, multiple random permutations are then employed to train a series of classifiers for achieving better discrimination power. In addition, we introduce a sequential forward floating search (SFFS) algorithm to perform model selection, resulting in further performance improvement. Extensive experimental evaluations and comparative studies clearly demonstrate that the proposed Bayesian Hashing approach outperforms other peer methods in both accuracy and speed. We achieve state-of-the-art results on well-known face recognition benchmarks using compact binary codes with significantly reduced computational overload and storage cost.
Robust Learning for Repeated Stochastic Games via Meta-Gaming
Crandall, Jacob W. (Masdar Institute of Science and Technology)
In repeated stochastic games (RSGs), an agent must quickly adapt to the behavior of previously unknown associates, who may themselves be learning. This machine-learning problem is particularly challenging due, in part, to the presence of multiple (even infinite) equilibria and inherently large strategy spaces. In this paper, we introduce a method to reduce the strategy space of two-player general-sum RSGs to a handful of expert strategies. This process, called mega, effectually reduces an RSG to a bandit problem. We show that the resulting strategy space preserves several important properties of the original RSG, thus enabling a learner to produce robust strategies within a reasonably small number of interactions. To better establish strengths and weaknesses of this approach, we empirically evaluate the resulting learning system against other algorithms in three different RSGs.