Goto

Collaborating Authors

 Statistical Learning


Similarity Is Not Entailment — Jointly Learning Similarity Transformation for Textual Entailment

AAAI Conferences

Predicting entailment between two given texts is an important task upon which the performance of numerous NLP tasks depend on such as question answering, text summarization, and information extraction. The degree to which two texts are similar has been used extensively as a key feature in much previous work in predicting entailment. However, using similarity scores directly, without proper transformations, results in suboptimal performance. Given a set of lexical similarity measures, we propose a method that jointly learns both (a) a set of non-linear transformation functions for those similarity measures and, (b) the optimal non-linear combination of those transformation functions to predict textual entailment. Our method consistently outperforms numerous baselines, reporting a micro-averaged F-score of 46.48 on the RTE- 7 benchmark dataset. The proposed method is ranked 2-nd among 33 entailment systems participated in RTE-7, demonstrating its competitiveness over numerous other entailment approaches. Although our method is statistically comparable to the current state-of-the-art, we require less external knowledge resources.


Sembler: Ensembling Crowd Sequential Labeling for Improved Quality

AAAI Conferences

Many natural language processing tasks, such as named entity recognition (NER), part of speech (POS) tagging, word segmentation, and etc., can be formulated as sequential data labeling problems. Building a sound labeler requires very large number of correctly labeled training examples, which may not always be possible. On the other hand, crowdsourcing provides an inexpensive yet efficient alternative to collect manual sequential labeling from non-experts. However the quality of crowd labeling cannot be guaranteed, and three kinds of errors are typical: (1) incorrect annotations due to lack of expertise (e.g., labeling gene names from plain text requires corresponding domain knowledge); (2) ignored or omitted annotations due to carelessness or low confidence; (3) noisy annotations due to cheating or vandalism. To correct these mistakes, we present Sembler, a statistical model for ensembling crowd sequential labelings. Sembler considers three types of statistical information: (1) the majority agreement that proves the correctness of an annotation; (2) correct annotation that improves the credibility of the corresponding annotator; (3) correct annotation that enhances the correctness of other annotations which share similar linguistic or contextual features. We evaluate the proposed model on a real Twitter and a synthetical biological data set, and find that Sembler is particularly accurate when more than half of annotators make mistakes.


Using First-Order Logic to Compress Sentences

AAAI Conferences

Sentence compression is one of the most challenging tasks in natural language processing,which may be of increasing interest to many applicationssuch as abstractive summarization and text simplification for mobile devices.In this paper, we present a novel sentence compression model based on first-order logic, using Markov Logic Network.Sentence compression is formulated as a word/phrase deletion problem in this model.By taking advantage of first-order logic, the proposed method is able to incorporate local linguistic features and to capture global dependencies between word deletion operations. Experiments on both written and spoken corpora show that our approach produces competitive performance against the state-of-the-art methods in terms of manual evaluation measures such as importance, grammaticality, and overall quality.


Modeling Textual Cohesion for Event Extraction

AAAI Conferences

Event extraction systems typically locate the role fillers for an event by analyzing sentences in isolation and identifying each role filler independently of the others. We argue that more accurate event extraction requires a view of the larger context to decide whether an entity is related to a relevant event. We propose a bottom-up approach to event extraction that initially identifies candidate role fillers independently and then uses that information as well as discourse properties to model textual cohesion. The novel component of the architecture is a sequentially structured sentence classifier that identifies event-related story contexts. The sentence classifier uses lexical associations and discourse relations across sentences, as well as domain-specific distributions of candidate role fillers within and across sentences. This approach yields state-of-the-art performance on the MUC-4 data set, achieving substantially higher precision than previous systems.


Unsupervised Detection of Music Boundaries by Time Series Structure Features

AAAI Conferences

In music, boundaries may occur because scientific domains, including artificial intelligence (Keogh of multiple changes, such as a change in instrumentation, 2011). Research on time series has a long tradition, but a change in harmony, or a change in tempo. The seminal its application to real-world datasets requires to cope with approach by Foote (2000) estimated these changes by new relevant issues, such as the multiple dimensionality of means of a so-called novelty curve, obtained by sliding a data or limited computational resources. Specifically, dealing short-time checkerboard kernel over the diagonal of a selfsimilarity with large-scale data, (1) algorithms must be efficient, matrix of pairwise sample comparisons. Works inspired i.e. they have to scale, (2) supervised approaches may become by Foote's approach explicitly make use of the concept unfeasible, and (3) solutions must use general techniques, of novelty curves (Paulus et al. 2010). Other musictargeted i.e. they should be as independent of the domain as approaches exploit homogeneities in a time series possible (see Mueen and Keogh 2010 for a more detailed by employing more refined techniques like hidden Markov discussion).


Visual Saliency Map from Tensor Analysis

AAAI Conferences

Modeling visual saliency map of an image provides important information for image semantic understanding in many applications. Most existing computational visual saliency models follow a bottom-up framework that generates independent saliency map in each selected visual feature space and combines them in a proper way. Two big challenges to be addressed explicitly in these methods are (1) which features should be extracted for all pixels of the input image and (2) how to dynamically determine importance of the saliency map generated in each feature space. In order to address these problems, we present a novel saliency map computational model based on tensor decomposition and reconstruction. Tensor representation and analysis not only explicitly represent image's color values but also imply two important relationships inherent to color image. One is reflecting spatial correlations between pixels and the other one is representing interplay between color channels. Therefore, saliency map generator based on the proposed model can adaptively find the most suitable features and their combinational coefficients for each pixel. Experiments on a synthetic image set and a real image set show that our method is superior or comparable to other prevailing saliency map models.


Performance and Preferences: Interactive Refinement of Machine Learning Procedures

AAAI Conferences

Problem-solving procedures have been typically aimed at achieving well-defined goals or satisfying straightforward preferences. However, learners and solvers may often generate rich multiattribute results with procedures guided by sets of controls that define different dimensions of quality. We explore methods that enable people to explore and express preferences about the operation of classification models in supervised multiclass learning. We leverage a leave-one-out confusion matrix that provides users with views and real-time controls of a model space. The approach allows people to consider in an interactive manner the global implications of local changes in decision boundaries. We focus on kernel classifiers and show the effectiveness of the methodology on a variety of tasks.


Learning to Learn: Algorithmic Inspirations from Human Problem Solving

AAAI Conferences

We harness the ability of people to perceive and interact with visual patterns in order to enhance the performance of a machine learning method. We show how we can collect evidence about how people optimize the parameters of an ensemble classification system using a tool that provides a visualization of misclassification costs. Then, we use these observations about human attempts to minimize cost in order to extend the performance of a state-of-the-art ensemble classification system. The study highlights opportunities for learning from evidence collected about human problem solving to refine and extend automated learning and inference.


Supervised Probabilistic Robust Embedding with Sparse Noise

AAAI Conferences

Many noise models do not faithfully reflect the noise processes introduced during data collection in many real-world applications. In particular, we argue that a type of noise referred to as sparse noise is quite commonly found in many applications and many existing works have been proposed to model such sparse noise. However, all the existing works only focus on unsupervised learning without considering the supervised information, i.e., label information. In this paper, we consider how to model and handle sparse noise in the context of embedding high-dimensional data under a probabilistic formulation for supervised learning. We propose a supervised probabilistic robust embedding (SPRE) model in which data are corrupted either by sparse noise or by a combination of Gaussian and sparse noises. By using the Laplace distribution as a prior to model sparse noise, we devise a two-fold variational EM learning algorithm in which the update of model parameters has analytical solution. We report some classification experiments to compare SPRE with several related models.


Online Kernel Selection: Algorithms and Evaluations

AAAI Conferences

Kernel methods have been successfully applied to many machine learning problems. Nevertheless, since the performance of kernel methods depends heavily on the type of kernels being used, identifying good kernels among a set of given kernels is important to the success of kernel methods. A straightforward approach to address this problem is cross-validation by training a separate classifier for each kernel and choosing the best kernel classifier out of them. Another approach is Multiple Kernel Learning (MKL), which aims to learn a single kernel classifier from an optimal combination of multiple kernels. However, both approaches suffer from a high computational cost in computing the full kernel matrices and in training, especially when the number of kernels or the number of training examples is very large. In this paper, we tackle this problem by proposing an efficient online kernel selection algorithm. It incrementally learns a weight for each kernel classifier. The weight for each kernel classifier can help us to select a good kernel among a set of given kernels. The proposed approach is efficient in that (i) it is an online approach and therefore avoids computing all the full kernel matrices before training; (ii) it only updates a single kernel classifier each time by a sampling technique and therefore saves time on updating kernel classifiers with poor performance; (iii) it has a theoretically guaranteed performance compared to the best kernel predictor. Empirical studies on image classification tasks demonstrate the effectiveness of the proposed approach for selecting a good kernel among a set of kernels.