Inductive Learning
Semi-Supervised Multinomial Naive Bayes for Text Classification by Leveraging Word-Level Statistical Constraint
Zhao, Li (Tsinghua University) | Huang, Minlie (Tsinghua University) | Yao, Ziyu (Beijing University of Posts and Telecommunications) | Su, Rongwei (Samsung Research and Development Institute China - Beijing) | Jiang, Yingying (Samsung Research and Development Institute China - Beijing) | Zhu, Xiaoyan (Tsinghua University)
Multinomial Naive Bayes with Expectation Maximization (MNB-EM) is a standard semi-supervised learning method to augment Multinomial Naive Bayes (MNB) for text classification. Despite its success, MNB-EM is not stable, and may succeed or fail to improve MNB. We believe that this is because MNB-EM lacks the ability to preserve the class distribution on words. In this paper, we propose a novel method to augment MNB-EM by leveraging the word-level statistical constraint to preserve the class distribution on words. The word-level statistical constraints are further converted to constraints on document posteriors generated by MNB-EM. Experiments demonstrate that our method can consistently improve MNB-EM, and outperforms state-of-art baselines remarkably.
Large-Scale Graph-Based Semi-Supervised Learning via Tree Laplacian Solver
Zhang, Yan-Ming (Institute of Automation, Chinese Academy of Sciences) | Zhang, Xu-Yao (Institute of Automation, Chinese Academy of Sciences) | Yuan, Xiao-Tong (Nanjing University of Information Science and Technology) | Liu, Cheng-Lin (Institute of Automation, Chinese Academy of Sciences)
Graph-based Semi-Supervised learning is one of the most popular and successful semi-supervised learning methods. Typically, it predicts the labels of unlabeled data by minimizing a quadratic objective induced by the graph, which is unfortunately a procedure of polynomial complexity in the sample size $n$. In this paper, we address this scalability issue by proposing a method that approximately solves the quadratic objective in nearly linear time. The method consists of two steps: it first approximates a graph by a minimum spanning tree, and then solves the tree-induced quadratic objective function in O(n) time which is the main contribution of this work. Extensive experiments show the significant scalability improvement over existing scalable semi-supervised learning methods.
Robust Semi-Supervised Learning through Label Aggregation
Yan, Yan (University of Technology Sydney) | Xu, Zhongwen (University of Technology Sydney) | Tsang, Ivor W. (University of Technology Sydney) | Long, Guodong (University of Technology Sydney) | Yang, Yi (University of Technology Sydney)
Semi-supervised learning is proposed to exploit both labeled and unlabeled data. However, as the scale of data in real world applications increases significantly, conventional semi-supervised algorithms usually lead to massive computational cost and cannot be applied to large scale datasets. In addition, label noise is usually present in the practical applications due to human annotation, which very likely results in remarkable degeneration of performance in semi-supervised methods. To address these two challenges, in this paper, we propose an efficient RObust Semi-Supervised Ensemble Learning (ROSSEL) method, which generates pseudo-labels for unlabeled data using a set of weak annotators, and combines them to approximate the ground-truth labels to assist semi-supervised learning. We formulate the weighted combination process as a multiple label kernel learning (MLKL) problem which can be solved efficiently. Compared with other semi-supervised learning algorithms, the proposed method has linear time complexity. Extensive experiments on five benchmark datasets demonstrate the superior effectiveness, efficiency and robustness of the proposed algorithm.
Constrained Submodular Minimization for Missing Labels and Class Imbalance in Multi-label Learning
Wu, Baoyuan (King Abdullah University of Science and Technology (KAUST)) | Lyu, Siwei ( State University of New York at Albany ) | Ghanem, Bernard (King Abdullah University of Science and Technology (KAUST))
Although many handle missing labels and class imbalance jointly. We formulate multi-label learning methods have been proposed in recent the problem as a transductive learning problem that years, a main challenge remains for this problem, i.e., the include five components that are label consistency, instancelevel lack of completely labeled training instances. This is important and class-level label smoothness, and two types of class because in many real life applications, most training cardinality (lower and upper) bounds. The first three components instances are only partially labeled, while other labels are are used to propagate the label information from the not provided or missing. One such example is image annotation, provided labels to missing labels, and the latter two components a human labeler can only feasibly annotates each are included to handle two types of the class imbalance training image with a subset of tags, especially when the problem. We first formulate a unified model that combines number of classes/tags is large. Learning from such partially these components as a constrained submodular minimization labeled instances is referred to as the multi-label learning problem (CSM). However, due to the class cardinality with missing labels (MLML) problem (Wu et al. 2014; constraint, it is a NPhard problem.
Multitask Generalized Eigenvalue Program
Wang, Boyu (McGill University) | Pineau, Joelle (McGill University) | Balle, Borja (Lancaster University)
We present a novel multitask learning framework called multitask generalized eigenvalue program (MTGEP), which jointly solves multiple related generalized eigenvalue problems (GEPs). This framework is quite general and can be applied to many eigenvalue problems in machine learning and pattern recognition, ranging from supervised learning to unsupervised learning, such as principal component analysis (PCA), Fisher discriminant analysis (FDA), common spatial pattern (CSP), and so on. The core assumption of our approach is that the leading eigenvectors of related GEPs lie in some subspace that can be approximated by a sparse linear combination of basis vectors. As a result, these GEPs can be jointly solved by a sparse coding approach. Empirical evaluation with both synthetic and benchmark real world datasets validates the efficacy and efficiency of the proposed techniques, especially for grouped multitask GEPs.
Towards Safe Semi-Supervised Learning for Multivariate Performance Measures
Li, Yu-Feng (Nanjing University) | Kwok, James T. (Hong Kong University of Science and Technology) | Zhou, Zhi-Hua (Nanjing University, China)
Semi-supervised learning (SSL) is an important research problem in machine learning. While it is usually expected that the use of unlabeled data can improve performance, in many cases SSL is outperformed by supervised learning using only labeled data. To this end, the construction of a performance-safe SSL method has become a key issue of SSL study. To alleviate this problem, we propose in this paper the UMVP (safe semi-sUpervised learning for MultiVariate Performance measure) method, because of the need of various performance measures in practical tasks. The proposed method integrates multiple semi-supervised learners, and maximizes the worst-case performance gain to derive the final prediction. The overall problem is formulated as a maximin optimization. In oder to solve the resultant difficult maximin optimization, this paper shows that when the performance measure is the Top- k Precision, F β score or AUC, a minimax convex relaxation of the maximin optimization can be solved efficiently. Experimental results show that the proposed method can effectively improve the safeness of SSL under multiple multivariate performance measures.
Robustness of Bayesian Pool-Based Active Learning Against Prior Misspecification
Cuong, Nguyen Viet (National University of Singapore) | Ye, Nan (Queensland University of Technology) | Lee, Wee Sun (National University of Singapore)
We study the robustness of active learning (AL) algorithms against prior misspecification: whether an algorithm achieves similar performance using a perturbed prior as compared to using the true prior. In both the average and worst cases of the maximum coverage setting, we prove that all alpha-approximate algorithms are robust (i.e., near alpha-approximate) if the utility is Lipschitz continuous in the prior. We further show that robustness may not be achieved if the utility is non-Lipschitz. This suggests we should use a Lipschitz utility for AL if robustness is required. For the minimum cost setting, we can also obtain a robustness result for approximate AL algorithms. Our results imply that many commonly used AL algorithms are robust against perturbed priors. We then propose the use of a mixture prior to alleviate the problem of prior misspecification. We analyze the robustness of the uniform mixture prior and show experimentally that it performs reasonably well in practice.
Learning Vector Quantization for Machine Learning - Machine Learning Mastery
A downside of K-Nearest Neighbors is that you need to hang on to your entire training dataset. The Learning Vector Quantization algorithm (or LVQ for short) is an artificial neural network algorithm that lets you choose how many training instances to hang onto and learns exactly what those instances should look like. In this post you will discover the Learning Vector Quantization algorithm. This post was written for developers and assumes no background in statistics or mathematics. The post focuses on how the algorithm works and how to use it for predictive modeling problems.
IllinoisCogComp/illinois-sl
Illinois Structured Learning Package (Illinois-SL) is a general purpose JAVA library for performing structured learning. It houses learning algorithms like averaged Structured Perceptron and Structured SVM with L2-Loss, and provides a minimal interface for your structured learning needs. The training algorithm employed for training SSVM is dual coordinate descent(DCD), which has been proven to have very good convergence properties. Illinois-SL comes with an efficient implementation of DCD with support for multi-threading. Illinois-SL provides a simple and neat framework for developing applications using structured prediction models.
Chinese Relation Extraction by Multiple Instance Learning
Chen, Yu-Ju (National Taiwan University) | Hsu, Jane Yung-jen (National Taiwan University)
Relation extraction, which learns semantic relations of concept pairs from text, is an approach for mining commonsense knowledge. This paper investigates an approach for relation extraction, which helps expand a commonsense knowledge base with little labor work. We proposed a framework that learns new pairs from Chinese corpora by adopting concept pairs in Chinese commonsense knowledge base as seeds. Multiple instance learning is utilized as the learning algorithm for predicting relation for unseen pairs. The performance of our system could be improved by learning multiple iterations. The results in each iteration are manually evaluated and processed to next iteration as seeds. Our experiments extracted new pairs for relations “AtLocation”, “CapableOf”, and “HasProperty”. This study showed that new pairs could be extracted from text without huge humans work.