Accuracy
Detection of Imperative and Declarative Question-Answer Pairs in Email Conversations
Kwong, Helen (Stanford University) | Yorke-Smith, Neil (SRI International)
Question-answer pairs extracted from email threads can help construct summaries of the thread, as well as inform semantic-based assistance with email. Previous work dedicated to email threads extracts only questions in interrogative form. We extend the scope of question and answer detection and pairing to encompass also questions in imperative and declarative forms, and to operate at sentence-level fidelity. Building on prior work, our methods are based on learned models over a set of features that include the content, context, and structure of email threads. For two large email corpora, we show that our methods balance precision and recall in extracting question-answer pairs, while maintaining a modest computation time.
Succinct Approximate Counting of Skewed Data
Practical data analysis relies on the ability to count observations of objectssuccinctly and efficiently. Unfortunately the space usage of an exact estimator grows with the size of the a priori set from which objects are drawn while the time required to maintain such an estimator grows with the size of the data set. We present static and on-line approximation schemes that avoid these limitations when approximate frequency estimates are acceptable. Our Log-Frequency Sketch extends the approximate counting algorithm of Morris [Morris1978] to estimate frequencies with bounded relative error via a single pass over a data set. It uses constant space per object when the frequencies follow a power law and can be maintained in constant time per observation. We give an (epsilon, delta)-approximation scheme which we verify empirically on a large natural language data set where, for instance, 95 percent of frequencies are estimated with relative error less than 0.25 using fewer than 11 bits per object in the static case and 15 bits per object on-line.
Bootstrap Voting Experts
Hewlett, Daniel (University of Arizona) | Cohen, Paul (University of Arizona)
Bootstrap Voting Experts (BVE) is an extension to the Voting Experts algorithm for unsupervised chunking of sequences. BVE generates a series of segmentations, each of which incorporates knowledge gained from the previous segmentation. We show that this method of bootstrapping improves the performance of Voting Experts in a variety of unsupervised word segmentation scenarios, and generally improves both precision and recall of the algorithm. We also show that Minimum Description Length (MDL) can be used to choose nearly optimal parameters for Voting Experts in an unsupervised manner.
Acquiring Agent-Based Models of Conflict from Event Data
Taylor, Glenn (Soar Technology, Inc.) | Quist, Michael (Soar Technology, Inc.) | Hicken, Allen (University of Michigan)
Building and using agent-based models is often impractical, in part due to the cost of including expensive subject matter experts (SMEs) in the development process. In this paper, we describe a method for "bootstrapping" model building to lower the cost of overall model development. The models we are interested in here capture dynamic phenomena related to international and subnational conflict. The method of acquiring these models begins with event data drawn from news reports about a conflict region, and infers model characteristics particular to a conflict modeling framework called the Power Structure Toolkit (PSTK). We describe the toolkit and how it has been used prior to this work. We then describe the current problem of modeling conflict and the empirical data available to learn models, and extensions to the PSTK for model generation from this data. We also describe a formative evaluation of the system that compares the performance and costs of models built entirely by an SME against models built with an SME aided by the automated model generation process. Early results indicate at least equivalent prediction rates with significant savings in model generation costs.
A Minimum Description Length Approach to Multitask Feature Selection
Many regression problems involve not one but several response variables (y's). Often the responses are suspected to share a common underlying structure, in which case it may be advantageous to share information across them; this is known as multitask learning. As a special case, we can use multiple responses to better identify shared predictive features -- a project we might call multitask feature selection. This thesis is organized as follows. Section 1 introduces feature selection for regression, focusing on ell_0 regularization methods and their interpretation within a Minimum Description Length (MDL) framework. Section 2 proposes a novel extension of MDL feature selection to the multitask setting. The approach, called the "Multiple Inclusion Criterion" (MIC), is designed to borrow information across regression tasks by more easily selecting features that are associated with multiple responses. We show in experiments on synthetic and real biological data sets that MIC can reduce prediction error in settings where features are at least partially shared across responses. Section 3 surveys hypothesis testing by regression with a single response, focusing on the parallel between the standard Bonferroni correction and an MDL approach. Mirroring the ideas in Section 2, Section 4 proposes a novel MIC approach to hypothesis testing with multiple responses and shows that on synthetic data with significant sharing of features across responses, MIC sometimes outperforms standard FDR-controlling methods in terms of finding true positives for a given level of false positives. Section 5 concludes.
Advanced Measures for Empirical Testing
Baumeister, Joachim (University of Würzburg)
Empirical testing is a very popular evaluation method for the development of intelligent systems. Here, previously solved problems with correct solutions are given as cases to the system. Validity is tested by comparing the expected results with the derived solutions. Besides classic forms of boolean testing of occurring solutions more refined methods are required for a thorough evaluation of real world knowledge systems. We present extended precision and recall functions for interactive knowledge systems that are generalizations of the existing measures. Additionally, we propose a visualization method for inspecting the validation result for interactive systems. A case study with a second-opinion system from the medical domain demonstrates the usefulness of the approach.
Confidence-based Tuning of Nomogram Predictions
Mancill, Tony (Washington State University Vancouver) | Wallace, Scott A (Washington State University Vancouver)
Instance classification using machine learning techniques has numerous applications, from automation to medical diagnosis. In many problem domains, such as spam filtering, classification must be performed quickly across large datasets. In this paper we begin with machine learning techniques based on the naive Bayes classification and attempt to improve classification performance by taking into account attribute confidence intervals. Our prediction functions operate over nominal datasets and retain the asymptotic complexity of one-pass learning and prediction functions. We present preliminary results indicating a modest, albeit inconsistent improvement over the naive Bayes classifier alone.
VipBoost: A More Accurate Boosting Algorithm
Su, Xiaoyuan (Florida Atlantic University) | Khoshgoftaar, Taghi M | Greiner, Russell
Boosting is a well-known method for improving the accuracy of many learning algorithms. In this paper, we propose a novel boosting algorithm, VipBoost (voting on boosting classifications from imputed learning sets), which first generates multiple incomplete datasets from the original dataset by randomly removing a small percentage of observed attribute values, then uses an imputer to fill in the missing values. It then applies AdaBoost (using some base learner) to produce classifiers trained on each of the imputed learning sets, to produce multiple classifiers. The subsequent prediction on a new test case is the most frequent classification from these classifiers. Our empirical results show that VipBoost produces very effective classifiers that significantly improve accuracy for unstable base learners and some stable learners, especially when the initial dataset is incomplete.
Testing Analogical Proportions with Google using Kolmogorov Information Theory
Prade, Henri (Institut de Recherche en Informatique de Toulouse) | Richard, Gilles (British Institute of Technology and E-Commerce)
Analogical reasoning is considered as one of the main mechanisms underlying creativity. "Thinking out of the box" allows the paradigm shift essential to a creative process. More common is the concept of analogical proportion ("2 is to 4 as 4 is to 8") which can be described within an algebraic framework. When it comes to concepts ("engine is to the car as heart is to the human"), we need to investigate a new way to understand this analogical ratio. In this paper, we take inspiration from the formal framework of information theory for proposing a new approach to the evaluation of analogy between concepts. Using Kolmogorov complexity as a backbone providing a clear semantics, we give a practical interpretation for analogy between words viewed as labeling concepts. Making use of Google as a linguistic resource, we provide an implementation of our definitions: experiments show that the accuracy of our definition is quite acceptable and justify the approach.
Mapping Grounded Object Properties across Perceptually Heterogeneous Embodiments
Kira, Zsolt (Georgia Institute of Technology)
As robots become more common, it becomes increasingly useful for them to communicate and effectively share knowledge that they have learned through their individual experiences. Learning from experiences, however, is often-times embodiment-specific; that is, the knowledge learned is grounded in the robot’s unique sensors and actuators. This type of learning raises questions as to how communication and knowledge exchange via social interaction can occur, as properties of the world can be grounded differently in different robots. This is especially true when the robots are heterogeneous, with different sensors and perceptual features used to define the properties. In this paper, we present methods and representations that allow heterogeneous robots to learn grounded property representations, such as that of color categories, and then build models of their similarities and differences in order to map their respective representations. We use a conceptual space representation, where object properties are learned and represented as regions in a metric space, implemented via supervised learning of Gaussian Mixture Models. We then propose to use confusion matrices that are built using instances from each robot, obtained in a shared context, in order to learn mappings between the properties of each robot. Results are demonstrated using two perceptually heterogeneous Pioneer robots, one with a web camera and another with a camcorder.