Goto

Collaborating Authors

 Industry


Speech Acts, Dialogues and the Common Ground

AAAI Conferences

The formal semantics of speech acts, even in the classical framework of illocutionary logic, requires considerations that go beyond individual speech activity and beyond the interpretation of individual sentences. We show how the formal semantics of speech acts can be extended to take into account the social effects and interactive aspects of illocutionary activity. To illustrate our approach, we focus on the semantics of assertions and descriptive discourse, contrasting the individual aspect of speaker's meaning and the epistemic effects of assertion making. The approach presented in this paper generalizes to all other types of illocutionary acts, adding specific content to the conversational record that registers the common ground of speakers and hearers as a dialogue unfolds.


A Linguistic Analysis of Expert-Generated Paraphrases

AAAI Conferences

The authors used the computational tool Coh-Metrix to examine expert writers’ paraphrases and in particular, how experts paraphrase text passages using condensing strategies. The overarching goal of this study was to develop machine learning algorithms to aid in the automatic detection of paraphrases and paraphrase types. To this end, three experts were instructed to paraphrase by condensing a set of target passages. The linguistic differences between the original passages and the condensed paraphrases were then analyzed using Coh-Metrix. The condensed paraphrases were accurately distinguished from the original target passages based on the number of words, word frequency, and syntactic complexity.


Arabic Cross-Document NLP for the Hadith and Biography Literature

AAAI Conferences

Recently cross-document integration and reconciliation of extracted information became of interest to researchers in Arabic natural language processing. Given a set of documents $A$, we use Arabic morphological analysis, finite state machines, and graph transformations to extract named entities N a and relations R a expressed as edges in a graph G = ( N a, R a ). We use the same techniques to extract entities N b and relations R b from a separate set of documents B. We use G to disambiguate N b and R and we integrate the resulting entities back into G by annotating the nodes and edges in G with elements from N b . We apply our approach in an iterative manner. Our results show a significant increase in accuracy from 41% to 93% after applying this cross-document NLP methodology to hadith and biography documents.


Automatic Coherence Profile in Public Speeches of Three Latin American Heads-of-State

AAAI Conferences

Different studies provide evidence that the computational psycholinguistic algorithm called Latent Semantic Analysis (LSA) allows measuring local and global coherence in texts similarly to human evaluation (Foltz, Kintsch, Landauer 1998; McNamara, Cai & Louwerse 2007; McCarthy, Briner, Rus, & McNamara, 2007; McNamara, Louwerse & Jeuniaux 2009; Louwerse, McCarthy & Graesser 2010). The texts used in all these studies are written in English and correspond to scientific and literary texts. In Spanish, there are some studies using LSA that measure the semantic similarity between texts in automatic summary assessment (Pérez, Alfonseca, Rodríguez, Gliozzo, Strapparava & Magnini 2005; León, Olmos, Escudero, Cañas & Salmerón 2006; Venegas 2007, 2009, 2011); however, automatic measurement of coherence in Spanish has not yet been sufficiently investigated. The present study aimed at identifying a global and local coherence profile in a corpus of speeches in Spanish of three Latin American Heads-of-States (Perón, Castro and Pinochet), using Latent Semantic Analysis. Local coherence is calculated through the measurement of implicit semantic similarity between adjacent sentences and global coherence through the measurement of the similarity among the semantic content of the paragraphs. The corpus under analysis corresponds to a sample of 107 speeches. The semantic space was built using a multi-register corpus and it is available through the “Interface for the measurement of lexical-semantic similarity” in the El Grial interface (www.elgrial.cl). Results showed a systematic difference between the speeches of the Heads-of-State in terms of both local and global coherence. The Bonferroni analysis established an effect that distinguishes Perón’s speeches from Pinochet’s and Castro’s speeches. This results show that Perón’s speeches are more topically related than the other leaders’, probably due to a discourse strategy to persuade voters. The identification of a profile of coherence might be relevant to predict cues of government discourse styles.


Identifying Personality Types Using Document Classification Methods

AAAI Conferences

Are the words that people use indicative of their personality type preferences? In this paper, it is hypothesized that word-usage is not independent of personality type, as measured by the Myers-Briggs Type Indicator (MBTI) personality assessment tool. In-class writing samples were taken from 40 graduate students along with the MBTI. The experiment utilizes naïve Bayes classifiers and Support Vector Machines (SVMs) in an attempt to guess an individual’s personality type based on their word-choice. Classification is also attempted using emotional, social, cognitive, and psychological dimensions elicited by the analysis software, Linguistic Inquiry and Word Count (LIWC). The classifiers are evaluated with 40 distinct trials (leave-one-out cross validation), and parameters are chosen using leave-one-out cross validation of each trial’s training set. The experiment showed that the naïve Bayes classifiers (word-based and LIWC-based) outperformed the SVMs when guessing Sensing-Intuition (S-N) and Thinking-Feeling (T-F).


Syntagmatic, Paradigmatic, and Automatic N-Gram Approaches to Assessing Essay Quality

AAAI Conferences

Computational indices related to n-gram production were developed in order to assess the potential for n-gram indices to predict human scores of essay quality. A regression analyses was conducted on a corpus of 313 argumentative essays. The analyses demonstrated that a variety of n-gram indices were highly correlated to essay quality, but were also highly correlated to the number of words in the text (although many of the n-gram indices were stronger predictors of writing quality than the number of words in a text). A second regression analysis was conducted on a corpus of 88 argumentative essays that were controlled for text length differences. This analysis demonstrated that n-gram indices were still strong predictors of essay quality when text length was not a factor.


Story-Level Inference and Gap Filling to Improve Machine Reading

AAAI Conferences

Machine reading aims at extracting formal knowledge representations from text to enable programs to execute some performance task, for example, diagnosis or answering complex queries stated in a formal representation language. Information extraction techniques are a natural starting point for machine reading, however, since they focus on explicit surface features at the phrase and sentence level, they generally miss information only stated implicitly. Moreover, the combination of multiple extraction results leads to error compounding which dramatically affects extraction quality for composite structures. To address these shortcomings, we present a new approach which aggregates locally extracted information into a larger story context and uses abductive constraint reasoning to generate the best story-level interpretation. We demonstrate that this approach significantly improves formal question answering performance on complex questions.


SenticNet 2: A Semantic and Affective Resource for Opinion Mining and Sentiment Analysis

AAAI Conferences

Web 2.0 has changed the ways people communicate, collaborate, and express their opinions and sentiments. But despite social data on the Web being perfectly suitable for human consumption, they remain hardly accessible to machines. To bridge the cognitive and affective gap between word-level natural language data and the concept-level sentiments conveyed by them, we developed SenticNet 2, a publicly available semantic and affective resource for opinion mining and sentiment analysis. SenticNet 2 is built by means of sentic computing, a new paradigm that exploits both AI and Semantic Web techniques to better recognize, interpret, and process natural language opinions. By providing the semantics and sentics (that is, the cognitive and affective information) associated with over 14,000 concepts, SenticNet 2 represents one of the most comprehensive semantic resources for the development of affect-sensitive applications in fields such as social data mining, multimodal affective HCI, and social media marketing.


The Devil Is in the Details: New Directions in Deception Analysis

AAAI Conferences

In this study, we use the computational textual analysis tool, the Gramulator, to identify and examine the distinctive linguistic features of deceptive and truthful discourse. The theme of the study is abortion rights and the deceptive texts are derived from a Devil’s Advocate approach, conducted to suppress personal beliefs and values. Our study takes the form of a contrastive corpus analysis, and produces systematic differences between truthful and deceptive personal accounts. Results suggest that deceivers employ a distancing strategy that is often associated with deceptive linguistic behavior. Ultimately, these deceivers struggle to adopt a truth perspective. Perhaps of most importance, our results indicate issues of concern with current deception detection theory and methodology. From a theoretical standpoint, our results question whether deceivers are deceiving at all or whether they are merely poorly expressing a rhetorical position, caused by being forced to speculate on a perceived proto-typical position. From a methodological standpoint, our results cause us to question the validity of deception corpora. Consequently, we propose new rigorous standards so as to better understand the subject matter of the deception field. Finally, we question the prevailing approach of abstract data measurement and call for future assessment to consider contextual lexical features. We conclude by suggesting a prudent approach to future research for fear that our eagerness to analyze and theorize may cause us to misidentify deception. After-all, successful deception, which is the kind we seek to detect, is likely to be an elusive and fickle prey.


Special Track on Applied Natural Language Processing

AAAI Conferences

Novel human-computer interfaces, for instance talking heads, can benefit from language understanding and generation techniques with big impact on user satisfaction. Dialoguebased intelligent tutoring systems require advanced dialogue processing, language understanding and generation components in order to assess students' natural language inputs and provide appropriate feedback. Moreover, language can facilitate human-computer interaction for the handicapped (no typing needed) and elderly leading to an ever increasing user base for computer systems. Some of the many areas emphasized by the ANLP track to include for contributions include multilingual processing, learning environments, multimodal communication, bioNLP, spam filtering, language acquisition (first and second), textual assessment, language varieties, materials development, generic classification, educational applications, information retrieval, speech processing, machine learning, knowledge representations, English for specific purposes, textual assessment indices, coreference resolution, word sense disambiguation, dialogue management and systems, language generation, language models, ontologies, and reasoning. For 2012, there were 15 submissions, out of which 10 were accepted as long papers and 3 as poster presentations.