Statistical Learning
Multi-Graph-View Learning for Complicated Object Classification
Wu, Jia (University of Technology, Sydney) | Pan, Shirui (University of Technology, Sydney) | Zhu, Xingquan (Florida Atlantic University) | Cai, Zhihua (China University of Geosciences, Wuhan) | Zhang, Chengqi (University of Technology, Sydney)
In this paper, we propose to represent and classify complicated objects. In order to represent the objects, we propose a multi-graph-view model which uses graphs constructed from multiple graph-views to represent an object. In addition, a bag based multi-graph model is further used to relax labeling by only requiring one label for a bag of graphs, which represent one object. In order to learn classification models, we propose a multi-graph-view bag learning algorithm (MGVBL), which aims to explore subgraph features from multiple graph-views for learning. By enabling a joint regularization across multiple graph-views, and enforcing labeling constraints at the bag and graph levels, MGVBL is able to discover most effective subgraph features across all graph-views for learning. Experiments on real-world learning tasks demonstrate the performance of MGVBL for complicated object classification.
A Joint Optimization Framework of Sparse Coding and Discriminative Clustering
Wang, Zhangyang (University of Illinois at Urbana-Champaign) | Yang, Yingzhen (University of Illinois at Urbana-Champaign) | Chang, Shiyu (University of Illinois at Urbana-Champaign) | Li, Jinyan (University of Macau) | Fong, Simon (University of Macau) | Huang, Thomas S (University of Illinois at Urbana-Champaign)
Many clustering methods highly depend on extracted features. In this paper, we propose a joint optimization framework in terms of both feature extraction and discriminative clustering. We utilize graph regularized sparse codes as the features, and formulate sparse coding as the constraint for clustering. Two cost functions are developed based on entropy-minimization and maximum-margin clustering principles, respectively, as the objectives to be minimized. Solving such a bi-level optimization mutually reinforces both sparse coding and clustering steps. Experiments on several benchmark datasets verify remarkable performance improvements led by the proposed joint optimization.
Discriminative Unsupervised Dimensionality Reduction
Wang, Xiaoqian (University of Texas at Arlington) | Liu, Yun (University of Texas at Arlington) | Nie, Feiping (University of Texas at Arlington) | Huang, Heng (University of Texas at Arlington)
As an important machine learning topic, dimensionality reduction has been widely studied and utilized in various kinds of areas. A multitude of dimensionality reduction methods have been developed, among which unsupervised dimensionality reduction is more desirable when obtaining label information requires onerous work. However, most previous unsupervised dimensionality reduction methods call for an affinity graph constructed beforehand, with which the following dimensionality reduction steps can be then performed. Separation of graph construction and dimensionality reduction leads the dimensionality reduction process highly dependent on quality of the input graph. In this paper, we propose a novel graph embedding method for unsupervised dimensionality reduction. We simultaneously conduct dimensionality reduction along with graph construction by assigning adaptive and optimal neighbors according to the projected local distances. Our method doesn’t need an affinity graph constructed in advance, but instead learns the graph concurrently with dimensionality reduction. Thus, the learned graph is optimal for dimensionality reduction. Meanwhile, our learned graph has an explicit block diagonal structure, from which the clustering results could be directly revealed without any postprocessing steps. Extensive empirical results on dimensionality reduction as well as clustering are presented to corroborate the performance of our method.
A Soft Version of Predicate Invention Based on Structured Sparsity
Wang, William Yang (Carnegie Mellon University) | Mazaitis, Kathryn (Carnegie Mellon University) | Cohen, William W. (Carnegie Mellon University)
In predicate invention (PI), new predicates are introduced into a logical theory, usually by rewriting a group of closely-related rules to use a common invented predicate as a "subroutine". PI is difficult, since a poorly-chosen invented predicate may lead to error cascades. Here we suggest a "soft" version of predicate invention: instead of explicitly creating new predicates, we implicitly group closely-related rules by using structured sparsity to regularize their parameters together. We show that soft PI, unlike hard PI, consistently improves over previous strong baselines for structure-learning on two large-scale tasks.
An Efficient Classifier Based on Hierarchical Mixing Linear Support Vector Machines
Wang, Di (Wenzhou University) | Zhang, Xiaoqin (Wenzhou University) | Fan, Mingyu (Wenzhou University) | Ye, Xiuzi (Wenzhou University)
SVM in advance, and this limits their applications to largescale problems. To address this issue, several methods for Support vector machines (SVMs) play a very dominant selecting a set of basis vectors are proposed. They include role in data classification due to their good sampling from the training set in the Nystrom method generalization performance. However, they suffer [Williams and Seeger, 2001] and variants of the Incomplete from the high computational complexity in the Cholesky factorization [Bach and Jordan, 2005], core vector classification phase when there are a considerable machine (CVM) [Tsang et al., 2005], relevance vector machine number of support vectors (SVs). Then it is desirable (RVM)[Tipping, 2001], and relevance units machine to design efficient algorithms in the classification (RUM)[Gao and Zhang, 2009]. Wu et al. [Wu et al., 2006] phase to deal with the datasets of realtime add one constraint on the number of basis vectors to the standard pattern recognition systems. To this end, we SVM optimization problem, and then solve this modified propose a novel classifier called HMLSVMs (Hierarchical nonconvex problem to build sparse kernel learning algorithms Mixing Linear Support Vector Machines) (SKLA). Joachims and Yu [Joachims and Yu, 2009] in this paper, which has a hierarchical structure explore a new sparse kernel SVMs via cutting plane training, with a mixing linear SVMs classifier at each node called cutting-plane subspace pursuit (CPSP).Although and predicts the label of a sample using only a the above methods prunes the SVs and reduces computational few hyperplanes. We also give a generalization complexity in classification phase, when a new test sample is error bound for the class of locally linear SVMs introduced, they still need to compare it with these pruned (LLSVMs) based on the Rademacher theory, which SVs via kernel calculations to predict the label of the test ensures that overfitting can be effectively avoided.
Constrained Information-Theoretic Tripartite Graph Clustering to Identify Semantically Similar Relations
Wang, Chenguang (Peking University) | Song, Yangqiu (University of Illinois at Urbana-Champaign) | Roth, Dan (University of Illinois at Urbana-Champaign) | Wang, Chi (Microsoft Research) | Han, Jiawei (University of Illinois at Urbana-Champaign) | Ji, Heng (Rensselaer Polytechnic Institute) | Zhang, Ming (Peking University)
In knowledge bases or information extraction results, differently expressed relations can be semantically similar (e.g., (X, wrote, Y) and (X,’s written work, Y)). Therefore, grouping semantically similar relations into clusters would facilitate and improve many applications, including knowledge base completion, information extraction, information retrieval, and more. This paper formulates relation clustering as a constrained tripartite graph clustering problem, presents an efficient clustering algorithm and exhibits the advantage of the constrained framework. We introduce several ways that provide side information via must-link and cannot link constraints to improve the clustering results. Different from traditional semi-supervised learning approaches, we propose to use the similarity of relation expressions and the knowledge of entity types to automatically construct the constraints for the algorithm. We show improved relation clustering results on two datasets extracted from human annotated knowledge base (i.e., Freebase) and open information extraction results (i.e., ReVerb data).
Feature Selection from Microarray Data via an Ordered Search with Projected Margin
Villela, Saulo Moraes (Federal University of Juiz de Fora) | Leite, Saul de Castro (Federal University of Juiz de Fora) | Neto, Raul Fonseca (Federal University of Juiz de Fora)
Microarray experiments are capable of measuring the expression level of thousands of genes simultaneously. Dealing with this enormous amount of information requires complex computation. Support Vector Machines (SVM) have been widely used with great efficiency to solve classification problems that have high dimension. In this sense, it is plausible to develop new feature selection strategies for microarray data that are associated with this type of classifier. Therefore, we propose, in this paper, a new method for feature selection based on an ordered search process to explore the space of possible subsets. The algorithm, called Admissible Ordered Search (AOS), uses as evaluation function the margin values estimated for each hypothesis by a SVM classifier. An important theoretical contribution of this paper is the development of the projected margin concept. This value is computed as the margin vector projection on a lower dimensional subspace and is used as an upper bound for the current value of the hypothesis in the search process. This enables great economy in runtime and consequently efficiency in the search process as a whole. The algorithm was tested using five different microarray data sets yielding superior results when compared to three representative feature selection methods.
Sketch the Storyline with CHARCOAL: A Non-Parametric Approach
Tang, Siliang (Zhejiang University) | Wu, Fei (Zhejiang University) | Li, Si (Zhejiang University) | Lu, Weiming (Zhejiang University) | Zhang, Zhongfei (Zhejiang University) | Zhuang, Yueting (Zhejiang University)
Generating a coherent synopsis and revealing the development threads for news stories from the increasing amounts of news content remains aformidable challenge. In this paper, we proposed a hddCRP (hybird distant-dependent ChineseRestaurant Process) based HierARChical tOpic model for news Article cLustering, abbreviated as CHARCOAL. Given a bunch of news articles, the outcome of CHARCOAL is threefold: 1) it aggregates relevant new articles into clusters (i.e., stories); 2) it disentangles the chain links (i.e., storyline) between articles in their describing story; 3) it discerns the topics that each story is assigned (e.g., Malaysia Airlines Flight 370 story belongs to the aircraft accident topic and U.S presidential election stories belong to the politics topic). CHARCOAL completes this task by utilizing a hddCRP as prior, and the entities (e.g., names of persons, organizations, or locations) that appear in news articles as clues. Moveover, the adaptation of nonparametric nature in CHARCOAL makes our model can adaptively learn the appropriate number of stories and topics from news corpus. The experimental analysis and results demonstrate both interpretability and superiority of the proposed approach.
Polytree-Augmented Classifier Chains for Multi-Label Classification
Sun, Lu (Hokkaido University) | Kudo, Mineichi (Hokkaido University)
Multi-label classification is a challenging and appealing supervised learning problem where a subset of labels, rather than a single label seen in traditional classification problems, is assigned to a single test instance. Classifier chains based methods are a promising strategy to tackle multi-label classification problems as they model label correlations at acceptable complexity. However, these methods are difficult to approximate the underlying dependency in the label space, and suffer from the problems of poorly ordered chain and error propagation. In this paper, we propose a novel polytree-augmented classifier chains method to remedy these problems. A polytree is used to model reasonable conditional dependence between labels over attributes, under which the directional relationship between labels within causal basins could be appropriately determined. In addition, based on the max-sum algorithm, exact inference would be performed on polytrees at reasonable cost, preventing from error propagation. The experiments performed on both artificial and benchmark multi-label data sets demonstrated that the proposed method is competitive with the state-of-the-art multi-label classification methods.
Semi-Orthogonal Multilinear PCA with Relaxed Start
Shi, Qiquan (Hong Kong Baptist University) | Lu, Haiping (Hong Kong Baptist University)
Principal component analysis (PCA) is an unsupervised method for learning low-dimensional features with orthogonal projections. Multilinear PCA methods extend PCA to deal with multidimensional data (tensors) directly via tensor-to-tensor projection or tensor-to-vector projection (TVP). However, under the TVP setting, it is difficult to develop an effective multilinear PCA method with the orthogonality constraint. This paper tackles this problem by proposing a novel Semi-Orthogonal Multilinear PCA (SO-MPCA) approach. SO-MPCA learns low-dimensional features directly from tensors via TVP by imposing the orthogonality constraint in only one mode. This formulation results in more captured variance and more learned features than full orthogonality. For better generalization, we further introduce a relaxed start (RS) strategy to get SO-MPCA-RS by fixing the starting projection vectors, which increases the bias and reduces the variance of the learning model. Experiments on both face (2D) and gait (3D) data demonstrate that SO-MPCA-RS outperforms other competing algorithms on the whole, and the relaxed start strategy is also effective for other TVP-based PCA methods.