Grammars & Parsing
Learning Sentence-internal Temporal Relations
In this paper we propose a data intensive approach for inferring sentence-internal temporal relations. Temporal inference is relevant for practical NLP applications which either extract or synthesize temporal information (e.g., summarisation, question answering). Our method bypasses the need for manual coding by exploiting the presence of markers like after", which overtly signal a temporal relation. We first show that models trained on main and subordinate clauses connected with a temporal marker achieve good performance on a pseudo-disambiguation task simulating temporal inference (during testing the temporal marker is treated as unseen and the models must select the right marker from a set of possible candidates). Secondly, we assess whether the proposed approach holds promise for the semi-automatic creation of temporal annotations. Specifically, we use a model trained on noisy and approximate data (i.e., main and subordinate clauses) to predict intra-sentential relations present in TimeBank, a corpus annotated rich temporal information. Our experiments compare and contrast several probabilistic models differing in their feature space, linguistic assumptions and data requirements. We evaluate performance against gold standard corpora and also against human subjects.
Strange Beta: An Assistance System for Indoor Rock Climbing Route Setting Using Chaotic Variations and Machine Learning
Phillips, Caleb, Becker, Lee, Bradley, Elizabeth
This paper applies machine learning and the mathematics of chaos to the task of designing indoor rock-climbing routes. Chaotic variation has been used to great advantage on music and dance, but the challenges here are quite different, beginning with the representation. We present a formalized system for transcribing rock climbing problems, then describe a variation generator that is designed to support human route-setters in designing new and interesting climbing problems. This variation generator, termed Strange Beta, combines chaos and machine learning, using the former to introduce novelty and the latter to smooth transitions in a manner that is consistent with the style of the climbs This entails parsing the domain-specific natural language that rock climbers use to describe routes and movement and then learning the patterns in the results. We validated this approach with a pilot study in a small university rock climbing gym, followed by a large blinded study in a commercial climbing gym, in cooperation with experienced climbers and expert route setters. The results show that {\sc Strange Beta} can help a human setter produce routes that are at least as good as, and in some cases better than, those produced in the traditional manner.
Parsing Combinatory Categorial Grammar with Answer Set Programming: Preliminary Report
Lierler, Yuliya, Schรผller, Peter
Combinatory categorial grammar (CCG) is a grammar formalism used for natural language parsing. CCG assigns structured lexical categories to words and uses a small set of combinatory rules to combine these categories to parse a sentence. In this work we propose and implement a new approach to CCG parsing that relies on a prominent knowledge representation formalism, answer set programming (ASP) - a declarative programming paradigm. We formulate the task of CCG parsing as a planning problem and use an ASP computational tool to compute solutions that correspond to valid parses. Compared to other approaches, there is no need to implement a specific parsing algorithm using such a declarative method. Our approach aims at producing all semantically distinct parse trees for a given sentence. From this goal, normalization and efficiency issues arise, and we deal with them by combining and extending existing strategies. We have implemented a CCG parsing tool kit - AspCcgTk - that uses ASP as its main computational means. The C&C supertagger can be used as a preprocessor within AspCcgTk, which allows us to achieve wide-coverage natural language parsing.
Beyond Flickr: Not All Image Tagging Is Created Equal
Klavans, Judith L. (University of Maryland College Park) | Guerra, Raul (University of Maryland) | LaPlante, Rebecca (University of Maryland) | Bachta, Ed ( Indianapolis Museum of Art) | Stein, Robert (Indianapolis Museum of Art)
This paper reports on the linguistic analysis of a tag set of nearly 50,000 tags collected as part of the steve.museum project. The tags describe images of objects in museum collections. We present our results on morphological, part of speech and semantic analysis. We demonstrate that deeper tag processing provides valuable information for organizing and categorizing social tags. This promises to improve access to museum objects by leveraging the characteristics of tags and the relationships between them rather than treating them as individual items. The paper shows the value of using deep computational linguistic techniques in interdisciplinary projects on tagging over images of objects in museums and libraries. We compare our data and analysis to Flickr and other image tagging projects.
#hardtoparse: POS Tagging and Parsing the Twitterverse
Foster, Jennifer (Dublin City University) | Cetinoglu, Ozlem (Dublin City University) | Wagner, Joachim (Dublin City University) | Roux, Joseph Le (LIF - CNRS) | Hogan, Stephen (Dublin City University) | Nivre, Joakim (Uppsala University) | Hogan, Deirdre (Dublin City University) | Genabith, Josef van (Dublin City University)
We evaluate the statistical dependency parser, Malt, on a new dataset of sentences taken from tweets. We use a version of Malt which is trained on gold standard phrase structure Wall Street Journal (WSJ) trees converted to Stanford labelled dependencies. We observe a drastic drop in performance moving from our in-domain WSJ test set to the new Twitter dataset, much of which has to do with the propagation of part-of-speech tagging errors. Retraining Malt on dependency trees produced by a state-of-the-art phrase structure parser, which has itself been self-trained on Twitter material, results in a significant improvement. We analyse this improvement by examining in detail the effect of the retraining on individual dependency types.
Embodied Language Processing: A New Generation of Language Technology
Pastra, Katerina (Cognitive Systems Research Institute) | Balta, Eirini (Cognitive Systems Research Institute) | Dimitrakis, Panagiotis (Cognitive Systems Research Institute) | Karakatsiotis, Giorgos (Cognitive Systems Research Institute)
At a computational level, language processing tasks are traditionally processed in a language-only space/context, isolated from perception and action. However, at a cognitive level, language processing has been shown experimentally to be embodied, i.e. to inform and be informed by perception and action. In this paper, we argue that embodied cognition dictates the development of a new generation of language processing tools that bridge the gap between the symbolic and the sensorimotor representation spaces. We describe that tasks and challenges such tools need to address and provide an overview of the first such suite of processing tools developed in the framework of the POETICON project.
Through the Twitter Glass: Detecting Questions in Micro-Text
Dent, Kyle D. (Palo Alto Research Center) | Paul, Sharoda A. (Palo Alto Research Center)
In a separate study, we were interested in understanding people's Q&A habits on Twitter. Finding questions within Twitter turned out to be a difficult challenge, so we considered applying some traditional NLP approaches to the problem. On the one hand, Twitter is full of idiosyncrasies, which make processing it difficult. On the other it is very restricted in length and tends to employ simple syntactic constructions, which could help the performance of NLP processing. In order to find out the viability of NLP and Twitter, we built a pipeline of tools to work specifically with Twitter input for the task of finding questions in tweets. This work is still preliminary, but in this paper we discuss the techniques we used and the lessons we learned.
Defining the Complexity of an Activity
Sahaf, Yasamin (Washington State University) | Krishnan, Narayanan Chatapuram (Washington State Univeristy) | Cook, Diane J. (Washington State University)
Activity recognition is a widely researched area with applications in health care, security and other domains. With each recognition system considering its own set of activities and sensors, it is difficult to compare the performance of these different systems and more importantly it makes the task of selecting an appropriate set of technologies and tools for recognizing an activity challenging. In this work-in-progress paper we attempt to characterize activities in terms of a complexity measure. We define activity complexity along three dimensions โ sensing, computation and performance and illustrate different parameters that parameterize these dimensions. We look at grammars for representing activities and use grammar complexity as a measurement for activity complexity. Then we describe how these measurements can help evaluate the complexity of activities of daily living that are commonly considered by various researchers.
WikiSimple: Automatic Simplification of Wikipedia Articles
Woodsend, Kristian (University of Edinburgh) | Lapata, Mirella (University of Edinburgh)
Text simplification aims to rewrite text into simpler versions and thus make information accessible to a broader audience (e.g., non-native speakers, children, and individuals with language impairments). In this paper, we propose a model that simplifies documents automatically while selecting their most important content and rewriting them in a simpler style. We learn content selection rules from same-topic Wikipedia articles written in the main encyclopedia and its Simple English variant. We also use the revision histories of Simple Wikipedia articles to learn a quasi-synchronous grammar of simplification rewrite rules. Based on an integer linear programming formulation, we develop a joint model where preferences based on content and style are optimized simultaneously. Experiments on simplifying main Wikipedia articles show that our method significantly reduces the reading difficulty, while still capturing the important content.
Tree Sequence Kernel for Natural Language
Sun, Jun (National University of Singapore) | Zhang, Min (Institute for Infocomm Research) | Tan, Chew Lim (National University of Singapore)
We propose Tree Sequence Kernel (TSK), which implicitly exhausts the structure features of a sequence of subtrees embedded in the phrasal parse tree. By incorporating the capability of sequence kernel, TSK enriches tree kernel with tree sequence features so that it may provide additional useful patterns for machine learning applications. Two approaches of penalizing the substructures are proposed and both can be accomplished by efficient algorithms via dynamic programming. Evaluations are performed on two natural language tasks, i.e. Question Classification and Relation Extraction. Experimental results suggest that TSK outperforms tree kernel for both tasks, which also reveals that the structure features made up of multiple subtrees are effective and play a complementary role to the single tree structure.