Inductive Learning
Learning to Interpret Natural Language Instructions
MacGlashan, James (University of Maryland, Baltimore County) | Babes-Vroman, Monica (Rutgers University) | Winner, Kevin (University of Maryland, Baltimore County) | Gao, Ruoyuan (Rutgers University) | Adjogah, Richard (University of Maryland, Baltimore County) | desJardins, Marie (University of Maryland, Baltimore County) | Littman, Michael (Rutgers University) | Muresan, Smaranda (Rutgers University)
We address the problem of training an artificial agent to follow verbal commands using a set of instructions paired with demonstration traces of appropriate behavior. From this data, a mapping from instructions to tasks is learned, enabling the agent to carry out new instructions in novel environments. Our system consists of three components: semantic parsing (SP), inverse reinforcement learning (IRL), and task abstraction (TA). SP parses sentences into logical form representations, but when learning begins, the domain/task specific meanings of these representations are unknown. IRL takes demonstration traces and determines the likely reward functions that gave rise to these traces, defined over a set of provided features. TA combines results from SP and IRL over a set of training instances to create abstract goal definitions of tasks. TA also provides SP domain specific meanings for its logical forms and provides IRL the set of task-relevant features.
Using the Web to Interactively Learn to Find Objects
Samadi, Mehdi (Carnegie Mellon University) | Kollar, Thomas (Carnegie Mellon University) | Veloso, Manuela (Carnegie Mellon University)
In order for robots to intelligently perform tasks with humans, they must be able to access a broad set of background knowledge about the environments in which they operate. Unlike other approaches, which tend to manually define the knowledge of the robot, our approach enables robots to actively query the World Wide Web (WWW) to learn background knowledge about the physical environment. We show that our approach is able to search the Web to infer the probability that an object, such as a "coffee,'' can be found in a location, such as a "kitchen.'' Our approach, called ObjectEval, is able to dynamically instantiate a utility function using this probability, enabling robots to find arbitrary objects in indoor environments. Our experimental results show that the interactive version of ObjectEval visits 28% fewer locations than the version trained offline and 71% fewer locations than a baseline approach which uses no background knowledge.
Modeling Textual Cohesion for Event Extraction
Huang, Ruihong (University of Utah) | Riloff, Ellen (University of Utah)
Event extraction systems typically locate the role fillers for an event by analyzing sentences in isolation and identifying each role filler independently of the others. We argue that more accurate event extraction requires a view of the larger context to decide whether an entity is related to a relevant event. We propose a bottom-up approach to event extraction that initially identifies candidate role fillers independently and then uses that information as well as discourse properties to model textual cohesion. The novel component of the architecture is a sequentially structured sentence classifier that identifies event-related story contexts. The sentence classifier uses lexical associations and discourse relations across sentences, as well as domain-specific distributions of candidate role fillers within and across sentences. This approach yields state-of-the-art performance on the MUC-4 data set, achieving substantially higher precision than previous systems.
Online Kernel Selection: Algorithms and Evaluations
Yang, Tianbao (Michigan State University) | Mahdavi, Mehrdad (Michigan State University) | Jin, Rong (Michigan State University) | Yi, Jinfeng (Michigan State University) | Hoi, Steven C.H. (Nanyang Technological University)
Kernel methods have been successfully applied to many machine learning problems. Nevertheless, since the performance of kernel methods depends heavily on the type of kernels being used, identifying good kernels among a set of given kernels is important to the success of kernel methods. A straightforward approach to address this problem is cross-validation by training a separate classifier for each kernel and choosing the best kernel classifier out of them. Another approach is Multiple Kernel Learning (MKL), which aims to learn a single kernel classifier from an optimal combination of multiple kernels. However, both approaches suffer from a high computational cost in computing the full kernel matrices and in training, especially when the number of kernels or the number of training examples is very large. In this paper, we tackle this problem by proposing an efficient online kernel selection algorithm. It incrementally learns a weight for each kernel classifier. The weight for each kernel classifier can help us to select a good kernel among a set of given kernels. The proposed approach is efficient in that (i) it is an online approach and therefore avoids computing all the full kernel matrices before training; (ii) it only updates a single kernel classifier each time by a sampling technique and therefore saves time on updating kernel classifiers with poor performance; (iii) it has a theoretically guaranteed performance compared to the best kernel predictor. Empirical studies on image classification tasks demonstrate the effectiveness of the proposed approach for selecting a good kernel among a set of kernels.
Multi-Label Learning by Exploiting Label Correlations Locally
Huang, Sheng-Jun (Nanjing University) | Zhou, Zhi-Hua (Nanjing University)
It is well known that exploiting label correlations is important for multi-label learning. Existing approaches typically exploit label correlations globally, by assuming that the label correlations are shared by all the instances. In real-world tasks, however, different instances may share different label correlations, and few correlations are globally applicable. In this paper, we propose the ML-LOC approach which allows label correlations to be exploited locally. To encode the local influence of label correlations, we derive a LOC code to enhance the feature representation of each instance. The global discrimination fitting and local correlation sensitivity are incorporated into a unified framework, and an alternating solution is developed for the optimization. Experimental results on a number of image, text and gene data sets validate the effectiveness of our approach.
Ontological Smoothing for Relation Extraction with Minimal Supervision
Zhang, Congle (University of Washington) | Hoffmann, Raphael (University of Washington) | Weld, Daniel Sabey (University of Washington)
Relation extraction, the process of converting natural language text into structured knowledge, is increasingly important. Most successful techniques use supervised machine learning to generate extractors from sentences that have been manually labeled with the relations' arguments. Unfortunately, these methods require numerous training examples, which are expensive and time-consuming to produce. This paper presents ontological smoothing, a semi-supervisedtechnique that learns extractors for a set of minimally-labeledrelations. Ontological smoothing has three phases. First, itgenerates a mapping between the target relations and a backgroundknowledge-base. Second, it uses distant supervision toheuristically generate new training examples for the targetrelations. Finally, it learns an extractor from a combination of theoriginal and newly-generated examples. Experiments on 65 relationsacross three target domains show that ontological smoothing candramatically improve precision and recall, even rivaling fully supervisedperformance in many cases.
A Mouse-Trajectory Based Model for Predicting Query-URL Relevance
Hengjie, Song ( Nanyang Technological University Baidu Inc. ) | Liao, Ruoxue (Baidu Inc.) | Zhang, Xiangliang (King Abdullah University of Science and Technology) | Miao, Chunyan (Nanyang Technological University) | Yang, Qiang (Hong Kong University of Science and Technology)
For the learning-to-ranking algorithms used in commercial search engines, a conventional way to generate the training examples is to employ professional annotators to label the relevance of query-url pairs. Since label quality depends on the expertise of annotators to a large extent, this process is time-consuming and labor-intensive. Automatically generating labels from click-through data has been well studied to have comparable or better performance than human judges. Click-through data present usersโ action and imply their satisfaction on search results, but exclude the interactions between users and search results beyond the page-view level (e.g., eye and mouse movements). This paper proposes a novel approach to comprehensively consider the information underlying mouse trajectory and click-through data so as to describe user behaviors more objectively and achieve a better understanding of the user experience. By integrating multi-sources data, the proposed approach reveals that the relevance labels of query-url pairs are related to positions of urls and usersโ behavioral features. Based on their correlations, query-url pairs can be labeled more accurately and search results are more satisfactory to users. The experiments that are conducted on the most popular Chinese commercial search engine (Baidu) validated the rationality of our research motivation and proved that the proposed approach outperformed the state-of-the-art methods.
Approximated Structured Prediction for Learning Large Scale Graphical Models
This manuscript contains the proofs for "A Primal-Dual Message-Passing Algorithm for Approximated Large Scale Structured Prediction" We derive the Lagrangian by introducing the Lagrange multipliers Anymaa (33") for every marginalization constraint:13an bggfiyfiafija): bawdy"), and Lagrange multipliers 0r for every equality We obtain the dual function by minimizing the beliefs over their compact domain, i.e. Deriving the dual by minimizing over the compact set of beliefs enables us to obtain an unconstrained dual, which corresponds to the approximated structured prediction program. Its final form is derived similarly to Claim. We find the optimal Amyyywoxgv) whenever the gradient vanishes, i.e. A Hx7y7afiv (3912) 'i' Anymaa (go) $957971?
An Introduction to Artificial Prediction Markets for Classification
Prediction markets are used in real life to predict outcomes of interest such as presidential elections. This paper presents a mathematical theory of artificial prediction markets for supervised learning of conditional probability estimators. The artificial prediction market is a novel method for fusing the prediction information of features or trained classifiers, where the fusion result is the contract price on the possible outcomes. The market can be trained online by updating the participants' budgets using training examples. Inspired by the real prediction markets, the equations that govern the market are derived from simple and reasonable assumptions. Efficient numerical algorithms are presented for solving these equations. The obtained artificial prediction market is shown to be a maximum likelihood estimator. It generalizes linear aggregation, existent in boosting and random forest, as well as logistic regression and some kernel methods. Furthermore, the market mechanism allows the aggregation of specialized classifiers that participate only on specific instances. Experimental comparisons show that the artificial prediction markets often outperform random forest and implicit online learning on synthetic data and real UCI datasets. Moreover, an extensive evaluation for pelvic and abdominal lymph node detection in CT data shows that the prediction market improves adaboost's detection rate from 79.6% to 81.2% at 3 false positives/volume.
Piecewise Training for Undirected Models
Sutton, Charles, McCallum, Andrew
For many large undirected models that arise in real-world applications, exact maximumlikelihood training is intractable, because it requires computing marginal distributions of the model. Conditional training is even more difficult, because the partition function depends not only on the parameters, but also on the observed input, requiring repeated inference over each training example. An appealing idea for such models is to independently train a local undirected classifier over each clique, afterwards combining the learned weights into a single global model. In this paper, we show that this piecewise method can be justified as minimizing a new family of upper bounds on the log partition function. On three natural-language data sets, piecewise training is more accurate than pseudolikelihood, and often performs comparably to global training using belief propagation.