Inductive Learning
Using Closed Captions as Supervision for Video Activity Recognition
Gupta, Sonal (Stanford University) | Mooney, Raymond J. (University of Texas at Austin)
Recognizing activities in real-world videos is a difficult problem exacerbated by background clutter, changes in camera angle & zoom, and rapid camera movements. Large corpora of labeled videos can be used to train automated activity recognition systems, but this requires expensive human labor and time. This paper explores how closed captions that naturally accompany many videos can act as weak supervision that allows automatically collecting "labeled" data for activity recognition. We show that such an approach can improve activity retrieval in soccer videos. Our system requires no manual labeling of video clips and needs minimal human supervision. We also present a novel caption classifier that uses additional linguistic information to determine whether a specific comment refers to an ongoing activity. We demonstrate that combining linguistic analysis and automatically trained activity recognizers can significantly improve the precision of video retrieval.
Gaussian Process Latent Random Field
Zhong, Guoqiang (Chinese Academy of Sciences) | Li, Wu-Jun (The Hong Kong University of Science and Technology) | Yeung, Dit-Yan (The Hong Kong University of Science and Technology) | Hou, Xinwen (Chinese Academy of Sciences) | Liu, Cheng-Lin (Chinese Academy of Sciences)
Multi-Task Active Learning with Output Constraints
Zhang, Yi (Carnegie Mellon University)
Many problems in information extraction, text mining, natural language processing and other fields exhibit the same property: multiple prediction tasks are related in the sense that their outputs (labels) satisfy certain constraints. In this paper, we propose an active learning framework exploiting such relations among tasks. Intuitively, with task outputs coupled by constraints, active learning can utilize not only the uncertainty of the prediction in a single task but also the inconsistency of predictions across tasks. We formalize this idea as a cross-task value of information criteria, in which the reward of a labeling assignment is propagated and measured over all relevant tasks reachable through constraints. A specific example of our framework leads to the cross entropy measure on the predictions of coupled tasks, which generalizes the entropy in the classical single-task uncertain sampling. We conduct experiments on two real-world problems: web information extraction and document classification. Empirical results demonstrate the effectiveness of our framework in actively collecting labeled examples for multiple related tasks.
Transductive Learning on Adaptive Graphs
Zhang, Yan-Ming (Chinese Academy of Sciences) | Zhang, Yu (Hong Kong University of Science and Technology) | Yeung, Dit-Yan (Hong Kong University of Science and Technology) | Liu, Cheng-Lin (Chinese Academy of Sciences) | Hou, Xinwen (Chinese Academy of Sciences)
Graph-based semi-supervised learning methods are based on some smoothness assumption about the data. As a discrete approximation of the data manifold, the graph plays a crucial role in the success of such graph-based methods. In most existing methods, graph construction makes use of a predefined weighting function without utilizing label information even when it is available. In this work, by incorporating label information, we seek to enhance the performance of graph-based semi-supervised learning by learning the graph and label inference simultaneously. In particular, we consider a particular setting of semi-supervised learning called transductive learning. Using the LogDet divergence to define the objective function, we propose an iterative algorithm to solve the optimization problem which has closed-form solution in each step. We perform experiments on both synthetic and real data to demonstrate improvement in the graph and in terms of classification accuracy.
Multi-Label Learning with Weak Label
Sun, Yu-Yin (Nanjing University) | Zhang, Yin (Nanjing University) | Zhou, Zhi-Hua (Nanjing University)
Multi-label learning deals with data associated with multiple labels simultaneously. Previous work on multi-label learning assumes that for each instance, the “full” label set associated with each training instance is given by users. In many applications, however, to get the full label set for each instance is difficult and only a “partial” set of labels is available. In such cases, the appearance of a label means that the instance is associated with this label, while the absence of a label does not imply that this label is not proper for the instance. We call this kind of problem “weak label” problem. In this paper, we propose the WELL (WEak Label Learning) method to solve the weak label problem. We consider that the classification boundary for each label should go across low density regions, and that each label generally has much smaller number of positive examples than negative examples. The objective is formulated as a convex optimization problem which can be solved efficiently. Moreover, we exploit the correlation between labels by assuming that there is a group of low-rank base similarities, and the appropriate similarities between instances for different labels can be derived from these base similarities. Experiments validate the performance of WELL.
Cost-Sensitive Semi-Supervised Support Vector Machine
Li, Yu-Feng (Nanjing University, China) | Kwok, James T. (Hong Kong University of Science and Technology) | Zhou, Zhi-Hua (Nanjing University, China)
In this paper, we study cost-sensitive semi-supervised learning where many of the training examples are unlabeled and different misclassification errors are associated with unequal costs. This scenario occurs in many real-world applications. For example, in some disease diagnosis, the cost of erroneously diagnosing a patient as healthy is much higher than that of diagnosing a healthy person as a patient. Also, the acquisition of labeled data requires medical diagnosis which is expensive, while the collection of unlabeled data such as basic health information is much cheaper. We propose the CS4VM (Cost-Sensitive Semi-Supervised Support Vector Machine) to address this problem. We show that the CS4VM, when given the label means of the unlabeled data, closely approximates the supervised cost-sensitive SVM that has access to the ground-truth labels of all the unlabeled data. This observation leads to an efficient algorithm which first estimates the label means and then trains the CS4VM with the plug-in label means by an efficient SVM solver. Experiments on a broad range of data sets show that the proposed method is capable of reducing the total cost and is computationally efficient.
Hydra: Automatically Configuring Algorithms for Portfolio-Based Selection
Xu, Lin (University of British Columbia) | Hoos, Holger (University of British Columbia) | Leyton-Brown, Kevin (University of British Columbia)
The AI community has achieved great success in designing high-performance algorithms for hard combinatorial problems, given both considerable domain knowledge and considerable effort by human experts. Two influential methods aim to automate this process: automated algorithm configuration and portfolio-based algorithm selection. The former has the advantage of requiring virtually no domain knowledge, but produces only a single solver; the latter exploits per-instance variation, but requires a set of relatively uncorrelated candidate solvers. Here, we introduce Hydra, a novel technique for combining these two methods, thereby realizing the benefits of both. Hydra automatically builds a set of solvers with complementary strengths by iteratively configuring new algorithms. It is primarily intended for use in problem domains for which an adequate set of candidate solvers does not already exist. Nevertheless, we tested Hydra on a widely studied domain, stochastic local search algorithms for SAT, in order to characterize its performance against a well-established and highly competitive baseline. We found that Hydra consistently achieved major improvements over the best existing individual algorithms, and always at least roughly matched — and indeed often exceeded — the performance of the best portfolios of these algorithms.
Learning to Predict Combinatorial Structures
The major challenge in designing a discriminative learning algorithm for predicting structured data is to address the computational issues arising from the exponential size of the output space. Existing algorithms make different assumptions to ensure efficient, polynomial time estimation of model parameters. For several combinatorial structures, including cycles, partially ordered sets, permutations and other graph classes, these assumptions do not hold. In this thesis, we address the problem of designing learning algorithms for predicting combinatorial structures by introducing two new assumptions: (i) The first assumption is that a particular counting problem can be solved efficiently. The consequence is a generalisation of the classical ridge regression for structured prediction. (ii) The second assumption is that a particular sampling problem can be solved efficiently. The consequence is a new technique for designing and analysing probabilistic structured prediction models. These results can be applied to solve several complex learning problems including but not limited to multi-label classification, multi-category hierarchical classification, and label ranking.
Empirical learning aided by weak domain knowledge in the form of feature importance
Standard hybrid learners that use domain knowledge require stronger knowledge that is hard and expensive to acquire. However, weaker domain knowledge can benefit from prior knowledge while being cost effective. Weak knowledge in the form of feature relative importance (FRI) is presented and explained. Feature relative importance is a real valued approximation of a feature's importance provided by experts. Advantage of using this knowledge is demonstrated by IANN, a modified multilayer neural network algorithm. IANN is a very simple modification of standard neural network algorithm but attains significant performance gains. Experimental results in the field of molecular biology show higher performance over other empirical learning algorithms including standard backpropagation and support vector machines. IANN performance is even comparable to a theory refinement system KBANN that uses stronger domain knowledge. This shows Feature relative importance can improve performance of existing empirical learning algorithms significantly with minimal effort.
ICWSM — A Great Catchy Name: Semi-Supervised Recognition of Sarcastic Sentences in Online Product Reviews
Tsur, Oren (The Hebrew University) | Davidov, Dmitry (The Hebrew University) | Rappoport, Ari (The Hebrew University)
Sarcasm is a sophisticated form of speech act widely used in online communities. Automatic recognition of sarcasm is, however, a novel task. Sarcasm recognition could contribute to the performance of review summarization and ranking systems. This paper presents SASI, a novel Semi-supervised Algorithm for Sarcasm Identification that recognizes sarcastic sentences in product reviews. SASI has two stages: semi-supervised pattern acquisition, and sarcasm classification. We experimented on a data set of about 66000 Amazon reviews for various books and products. Using a gold standard in which each sentence was tagged by 3 annotators, we obtained precision of 77% and recall of 83.1% for identifying sarcastic sentences. We found some strong features that characterize sarcastic utterances. However, a combination of more subtle pattern-based features proved more promising in identifying the various facets of sarcasm. We also speculate on the motivation for using sarcasm in online communities and social networks.