Europe
The Role of Knowledge-based Features in Polarity Classification at Sentence Level
Wiegand, Michael (Saarland University) | Klakow, Dietrich (Saarland University)
Though polarity classification has been extensively explored at document level, there has been little work investigating feature design at sentence level. Due to the small number of words within a sentence, polarity classification at sentence level differs substantially from document-level classification in that resulting bag-of-words feature vectors tend to be very sparse resulting in a lower classification accuracy. In this paper, we show that performance can be improved by adding features specifically designed for sentence-level polarity classification. We consider both explicit polarity information and various linguistic features. A great proportion of the improvement that can be obtained by using polarity information can also be achieved by using a set of simple domain-independent linguistic features.
c-rater:Automatic Content Scoring for Short Constructed Responses
Sukkarieh, Jana Zuheir (Educational Testing Service) | Blackmore, John (Educational Testing Service)
The education community is moving towards constructed or free-text responses and computer-based assessment. At the same time, progress in natural language processing and knowledge representation has made it possible to consider free-text or constructed responses without having to fully understand the text. c-rater is a technology at Educational Testing Service (ETS) used for automatic content scoring for short, free-text responses. This paper describes some of the major developments made in c-rater recently.
Testing Analogical Proportions with Google using Kolmogorov Information Theory
Prade, Henri (Institut de Recherche en Informatique de Toulouse) | Richard, Gilles (British Institute of Technology and E-Commerce)
Analogical reasoning is considered as one of the main mechanisms underlying creativity. "Thinking out of the box" allows the paradigm shift essential to a creative process. More common is the concept of analogical proportion ("2 is to 4 as 4 is to 8") which can be described within an algebraic framework. When it comes to concepts ("engine is to the car as heart is to the human"), we need to investigate a new way to understand this analogical ratio. In this paper, we take inspiration from the formal framework of information theory for proposing a new approach to the evaluation of analogy between concepts. Using Kolmogorov complexity as a backbone providing a clear semantics, we give a practical interpretation for analogy between words viewed as labeling concepts. Making use of Google as a linguistic resource, we provide an implementation of our definitions: experiments show that the accuracy of our definition is quite acceptable and justify the approach.
Computational Replication of Human Paraphrase Assessment
McCarthy, Philip Michael (The University of Memphis) | Cai, Zhigiang (The University of Memphis) | McNamara, Danielle S. (The University of Memphis)
Two sentences are paraphrases if their meanings are equivalent but their words and syntax are different. Paraphrasing can be used to aid comprehension, stimulate prior knowledge, and assist in writing skills development. While automated paraphrase assessment is both common-place and useful, research has centered solely on artificial, edited paraphrases and has used only binary dimensions (i.e., is or is-not a paraphrase). In this study, we use 1998 natural paraphrases generated by high school students that have been assessed along 10 dimensions of paraphrase (e.g., semantic completeness). This study investigates the components of paraphrase quality emerging from these dimensions, and examines whether computational approaches (e.g. LSA, MED) can simulate those human evaluations. The results suggest that semantic and syntactic evaluations are the primary components of paraphrase quality, and that computationally light systems such as LSA (semantics) and MED (syntax) present promising approaches to simulating human evaluations of paraphrases.
Paraphrase Identification Using Weighted Dependencies and Word Semantics
Lintean, Mihai (University of Memphis) | Rus, Vasile (University of Memphis)
In this paper we propose a novel approach to the task of paraphrase identification. The proposed approach quantifies both the similarity and dissimilarity between two sentences. The similarity and dissimilarity is assessed based on lexico-semantic information, i.e., word semantics, and syntactic information in the form of dependencies, which are explicit syntactic relations between words in a sentence. Word semantics requires mapping words onto concepts in a taxonomy and then using word-to-word similarity metrics to compute their semantic relatedness. Dependencies are obtained using state-of-the-art dependency parsers. One important aspect of our approach is the weighting of missing dependencies, i.e., syntactic relations present in one sentence but not the other. We report experimental results on the Microsoft Paraphrase Corpus, a standard data set for evaluating approaches to paraphrase identification. The experiments showed that the proposed approach offers state-of-the-art results. In particular, our approach offers better precision when compared to other state-of-the-art systems.
CombiTagger: A System for Developing Combined Taggers
Henrich, Verena (UAS Darmstadt) | Reuter, Timo (UAS Darmstadt) | Loftsson, Hrafn (Reykjavik University)
The main task of part-of-speech (PoS) tagging is to assign the appropriate morphosyntactic category to each word in a sentence. A combination of different PoS taggers usually results in higher tagging accuracy than obtained by the use of only a single tagger. We present a new language and tagset independent system, CombiTagger, which combines automatically the output of several taggers. The system, which is open source, provides algorithms for simple and weighted voting, but it is extensible so that other combination algorithms can be added easily. We demonstrate the functionality of CombiTagger by using it to develop and evaluate combined taggers for Icelandic. The most accurate individual tagger obtains an accuracy of 91.83%. CombiTagger achieves 93.09%-93.41% accuracy by combining the output of five or six taggers using simple and weighted voting.
Assessment of LDAT as a Grammatical Diversity Assessment Tool
Healy, Scott Leigh (The University of Memphis) | Weintraub, Joseph D. (The University of Memphis) | McCarthy, Philip M. (The University of Memphis) | Hall, Charles E. (The University of Memphis) | McNamara, Danielle S. (The University of Memphis)
The purpose of this study is to evaluate the validity of measuring grammatical diversity with a specifically designed Lexical Diversity Assessment Tool (LDAT). A secondary objective is to use LDAT to determine if the level of difficulty assigned to English as a Second Language (ESL) texts corresponds to increases in grammatical, lexical, and temporal diversity. Other methods of lexical diversity assessment, such as type-token ratio (TTR), have been used with varying accuracy in an effort to determine the complexity or level of texts. We analyzed 120 ESL texts independently assigned by their sources to one of four levels (Beginner, Lower-intermediate, Upper-intermediate, and Advanced). We demonstrated that LDAT significantly reflected the grammatical diversity within these texts. While the findings conflicted with the prediction that grammatical and lexical diversity would increase with assigned level, we concluded that the implementation of LDAT in text design could provide reliable assessments of grammatical diversity.
A Coh-Metrix Analysis of Variation among Biomedical Abstracts
Duncan, Benjamin (Texas A&M University) | Hall, Charles (University of Memphis)
Using the already validated Coh-Metrix tool, this study examines whether there are significant linguistic and discourse differences between biomedical abstracts for American and Korean English. Also, the current study accounts for variation among journals’ countries of origin, distinguishing between biomedical journals published in the United States from biomedical journals published in South Korea. The significance of these studies regards the growing number of second language (L2) biomedical researchers attempting to publish their research in national and international English-language journals, but who find themselves locked out of the discussion because of differences in linguistic and discourse conventions. The present study aims to provide a more thorough and quantitative understanding of the prototypical linguistic components in biomedical rhetoric, and to suggest how word-, sentence-, and discourse-level structures can be researched, taught, and developed into materials. This improved understanding is expected to provide a powerful apparatus for the promotion of L2 English writers in the biomedical field.
Hierarchical Soft Clustering and Automatic Text Summarization for Accessing the Web on Mobile Devices for Visually Impaired People
Dias, Gaël Harry (University of Beira Interior) | Pais, Sebastião (University of Beira Interior) | Cunha, Fernando (University of Beira Interior) | Costa, Hugo (University of Beira Interior) | Machado, David (University of Beira Interior) | Barbosa, Tiago (University of Beira Interior) | Martins, Bruno (University of Beira Interior)
In this paper, we propose a universal solution to web search and web browsing on handheld devices for visually impaired people. For this purpose, we propose (1) to automatically cluster web page results and (2) to summarize all the information in web pages so that speech-to-speech interaction is used efficiently to access information.
A New Method for Measuring English Verb's Metaphor Making Potential
Chen, Zili (City University of Hong Kong) | Webster, Jonathan J. (City University of Hong Kong) | Hao, Tianyong (City University of Hong Kong) | Chow, Ian C. (City University of Hong Kong)
A general practice in the research of metaphor has been to investigate its behavior and function in different contexts. This current study aims to investigate the notion that verbs possess a metaphor-making potential, this being an initiatory context-free experiment with metaphor. The goal of this paper is to carry out an in-depth case study of a group of English core verbs using WordNet and SUMO ontology. In order to operationalize the measurement of an English verb’s metaphor making potential, a new algorithm has been developed, and a program made to realize the computation. At last, it has been observed that higher frequency verbs generally possess greater metaphor making potential; while a verb’s metaphor making potential on the other hand is also strongly influenced by its functional category.