Support Vector Machines
Semi-supervised Vocabulary-informed Learning
Despite significant progress in object categorization, in recent years, a number of important challenges remain, mainly, ability to learn from limited labeled data and ability to recognize object classes within large, potentially open, set of labels. Zero-shot learning is one way of addressing these challenges, but it has only been shown to work with limited sized class vocabularies and typically requires separation between supervised and unsupervised classes, allowing former to inform the latter but not vice versa. We propose the notion of semi-supervised vocabulary-informed learning to alleviate the above mentioned challenges and address problems of supervised, zero-shot and open set recognition using a unified framework. Specifically, we propose a maximum margin framework for semantic manifold-based recognition that incorporates distance constraints from (both supervised and unsupervised) vocabulary atoms, ensuring that labeled samples are projected closest to their correct prototypes, in the embedding space, than to others. We show that resulting model shows improvements in supervised, zero-shot, and large open set recognition, with up to 310K class vocabulary on AwA and ImageNet datasets.
Developing a Machine Learning Model to QA human meta data attribution โข /r/MachineLearning
I want to develop a machine learning model in R that I can deploy in Java. I want to describe what I've tried, and how it failed, and what my next iteration is so every one here can help guide me. We have a system where a human will examine raw text and assign it meta-data where each piece of meta-data is a separate category. For example, say we had String 1 and available categories A-Z and a human assigned String 1 categories A, B, C, and F. There is a large amount of human QA that happens afterwards to ensure that String 1 either received all the categories it should have and didn't receive any categories it shouldn't have, for example String 1 should not have received the f category. I am tasked with developing a way to automatically detect if a String needs QA after meta-data has been assigned to it.
Semi-supervised Learning with Induced Word Senses for State of the Art Word Sense Disambiguation
Baลkaya, Osman, Jurgens, David
Word Sense Disambiguation (WSD) aims to determine the meaning of a word in context, and successful approaches are known to benefit many applications in Natural Language Processing. Although supervised learning has been shown to provide superior WSD performance, current sense-annotated corpora do not contain a sufficient number of instances per word type to train supervised systems for all words. While unsupervised techniques have been proposed to overcome this data sparsity problem, such techniques have not outperformed supervised methods. In this paper, we propose a new approach to building semi-supervised WSD systems that combines a small amount of sense-annotated data with information from Word Sense Induction, a fully-unsupervised technique that automatically learns the different senses of a word based on how it is used. In three experiments, we show how sense induction models may be effectively combined to ultimately produce high-performance semi-supervised WSD systems that exceed the performance of state-of-the-art supervised WSD techniques trained on the same sense-annotated data. We anticipate that our results and released software will also benefit evaluation practices for sense induction systems and those working in low-resource languages by demonstrating how to quickly produce accurate WSD systems with minimal annotation effort.
An Introduction to Machine Learning
This book presents basic ideas of machine learning in a way that is easy to understand, by providing hands-on practical advice, using simple examples, and motivating students with discussions of interesting applications. The main topics include Bayesian classifiers, nearest-neighbor classifiers, linear and polynomial classifiers, decision trees, neural networks, and support vector machines. Later chapters show how to combine these simple tools by way of "boosting," how to exploit them in more complicated domains, and how to deal with diverse advanced practical issues. One chapter is dedicated to the popular genetic algorithms.
Support Vector Machines for Machine Learning - Machine Learning Mastery
Support Vector Machines are perhaps one of the most popular and talked about machine learning algorithms. They were extremely popular around the time they were developed in the 1990s and continue to be the go-to method for a high-performing algorithm with little tuning. In this post you will discover the Support Vector Machine (SVM) machine learning algorithm. SVM is an exciting algorithm and the concepts are relatively simple. This post was written for developers with little or no background in statistics and linear algebra.
Labeling the Features Not the Samples: Efficient Video Classification with Minimal Supervision
Leordeanu, Marius (Institute of Mathematics of the Romanian Academy) | Radu, Alexandra (Institute of Mathematics of the Romanian Academy) | Baluja, Shumeet (Google Research) | Sukthankar, Rahul (Google Research)
Feature selection is essential for effective visual recognition. We propose an efficient joint classifier learning and feature selection method that discovers sparse, compact representations of input features from a vast sea of candidates, with an almost unsupervised formulation. Our method requires only the following knowledge, which we call the feature sign - whether or not a particular feature has on average stronger values over positive samples than over negatives. We show how this can be estimated using as few as a single labeled training sample per class. Then, using these feature signs, we extend an initial supervised learning problem into an (almost) unsupervised clustering formulation that can incorporate new data without requiring ground truth labels. Our method works both as a feature selection mechanism and as a fully competitive classifier. It has important properties, low computational cost annd excellent accuracy, especially in difficult cases of very limited training data. We experiment on large-scale recognition in video and show superior speed and performance to established feature selection approaches such as AdaBoost, Lasso, greedy forward-backward selection, and powerful classifiers such as SVM.
Concepts Not Alone: Exploring Pairwise Relationships for Zero-Shot Video Activity Recognition
Gan, Chuang (Tsinghua University) | Lin, Ming (University of Michigan, Ann Arbor) | Yang, Yi (University of Technology Sydney) | Melo, Gerard de (Tsinghua University) | Hauptmann, Alexander G. (Carnegie Mellon University)
Vast quantities of videos are now being captured at astonishing rates, but the majority of these are not labelled. To cope with such data, we consider the task of content-based activity recognition in videos without any manually labelled examples, also known as zero-shot video recognition. To achieve this, videos are represented in terms of detected visual concepts, which are then scored as relevant or irrelevant according to their similarity with a given textual query. In this paper, we propose a more robust approach for scoring concepts in order to alleviate many of the brittleness and low precision problems of previous work. Not only do we jointly consider semantic relatedness, visual reliability, and discriminative power. To handle noise and non-linearities in the ranking scores of the selected concepts, we propose a novel pairwise order matrix approach for score aggregation. Extensive experiments on the large-scale TRECVID Multimedia Event Detection data show the superiority of our approach.
TGSum: Build Tweet Guided Multi-Document Summarization Dataset
Cao, Ziqiang (The Hong Kong Polytechnic University) | Chen, Chengyao (The Hong Kong Polytechnic University) | Li, Wenjie (The Hong Kong Polytechnic University) | Li, Sujian (Peking University) | Wei, Furu (Microsoft Research) | Zhou, Ming (Microsoft Research)
The development of summarization research has been significantly hampered by the costly acquisition of reference summaries. This paper proposes an effective way to automatically collect large scales of news-related multi-document summaries with reference to social media's reactions. We utilize two types of social labels in tweets, i.e., hashtags and hyper-links. Hashtags are used to cluster documents into different topic sets. Also, a tweet with a hyper-link often highlights certain key points of the corresponding document. We synthesize a linked document cluster to form a reference summary which can cover most key points. To this aim, we adopt the ROUGE metrics to measure the coverage ratio, and develop an Integer Linear Programming solution to discover the sentence set reaching the upper bound of ROUGE. Since we allow summary sentences to be selected from both documents and high-quality tweets, the generated reference summaries could be abstractive. Both informativeness and readability of the collected summaries are verified by manual judgment. In addition, we train a Support Vector Regression summarizer on DUC generic multi-document summarization benchmarks. With the collected data as extra training resource, the performance of the summarizer improves a lot on all the test sets. We release this dataset for further research.
Robust Semi-Supervised Learning through Label Aggregation
Yan, Yan (University of Technology Sydney) | Xu, Zhongwen (University of Technology Sydney) | Tsang, Ivor W. (University of Technology Sydney) | Long, Guodong (University of Technology Sydney) | Yang, Yi (University of Technology Sydney)
Semi-supervised learning is proposed to exploit both labeled and unlabeled data. However, as the scale of data in real world applications increases significantly, conventional semi-supervised algorithms usually lead to massive computational cost and cannot be applied to large scale datasets. In addition, label noise is usually present in the practical applications due to human annotation, which very likely results in remarkable degeneration of performance in semi-supervised methods. To address these two challenges, in this paper, we propose an efficient RObust Semi-Supervised Ensemble Learning (ROSSEL) method, which generates pseudo-labels for unlabeled data using a set of weak annotators, and combines them to approximate the ground-truth labels to assist semi-supervised learning. We formulate the weighted combination process as a multiple label kernel learning (MLKL) problem which can be solved efficiently. Compared with other semi-supervised learning algorithms, the proposed method has linear time complexity. Extensive experiments on five benchmark datasets demonstrate the superior effectiveness, efficiency and robustness of the proposed algorithm.
Scaling Simultaneous Optimistic Optimization for High-Dimensional Non-Convex Functions with Low Effective Dimensions
Qian, Hong (Nanjing University) | Yu, Yang (Nanjing University)
Simultaneous optimistic optimization (SOO) is a recently proposed global optimization method with a strong theoretical foundation. Previous studies have shown that SOO has a good performance in low-dimensional optimization problems, however, its performance is unsatisfactory when the dimensionality is high. This paper adapts random embedding to scaling SOO, resulting in the RESOO algorithm. We prove that the simple regret of RESOO depends only on the effective dimension of the problem, while that of SOO depends on the dimension of the solution space. Empirically, on some high-dimensional non-convex testing functions as well as hyper-parameter tuning tasks for multi-class support vector machines, RESOO shows significantly improved performance from SOO.