microtext
Analyzing Microtext
The titles of the eight symposia were Analyzing Microtext; Creativity and (Early) Cognitive Development; Data-Driven Wellness: From Self-Tracking to Behavior Change; Designing Intelligent Robots: Reintegrating AI II; Lifelong Machine Learning; Shikakeology: Designing Triggers for Behavior Change; Trust and Autonomous Systems; and Weakly Supervised Learning from Multimedia. This report contains summaries of the symposia, written, in most cases, by the cochairs of the symposium. Much progress has been made in recent years in several areas within natural language processing. However, so far there has been less work related to microtext (for example, instant messaging, transcribed speech, and microblogs such as Twitter and Facebook). Microtext is made up of semistructured pieces of text that are distinguished by their brevity, informality, idiosyncratic lexicon, nonstandard grammar, misspelling, use of emoticons, and sometimes simultaneous interwoven conversation.
Reports of the 2013 AAAI Spring Symposium Series
Markman, Vita (Disney Interactive Studios) | Stojanov, Georgi (American University of Paris) | Indurkhya, Bipin (International Institute of Information Technology) | Kido, Takashi (Rikengenesis) | Takadama, Keiki (University of Electro-Communications) | Konidaris, George (Massachusetts Institute of Technology) | Eaton, Eric (Bryn Mawr College) | Matsumura, Naohiro (Osaka University) | Fruchter, Renate (Stanford University) | Sofge, Donald (Naval Research Laboratory) | Lawless, William (Paine College) | Madani, Omid (Google) | Sukthankaris, Rahul (Google)
The Association for the Advancement of Artificial Intelligence was pleased to present the AAAI 2013 Spring Symposium Series, held Monday through Wednesday, March 25-27, 2013. The titles of the eight symposia were Analyzing Microtext, Creativity and (Early) Cognitive Development, Data Driven Wellness: From Self-Tracking to Behavior Change, Designing Intelligent Robots: Reintegrating AI II, Lifelong Machine Learning, Shikakeology: Designing Triggers for Behavior Change, Trust and Autonomous Systems, and Weakly Supervised Learning from Multimedia. This report contains summaries of the symposia, written, in most cases, by the cochairs of the symposium.
Dynamic Microcluster Chains in Microtext
Robinson, Jason R. (The MITRE Corporation) | Condon, Sherri Lee (The MITRE Corporation)
Two features of microtext that challenge language processing tools are addressed in the context of linking messages in the emergency response domain. First, the effect of very short texts on several classifiers is estimated by comparing the results when classifiers are applied to the full text of news reports vs. only the headlines. These experiments demonstrate a decrease of 5 - 20% in accuracy. A second challenging feature of microtexts is their accumulation in real time, which can be massive for sources such as Twitter. A dynamic hierarchical clustering algorithm that clusters messages as they accumulate is described, and a preliminary experiment in clustering tweets is demonstrated.
A CCG-Based Approach to Fine-Grained Sentiment Analysis in Microtext
Smith, Phillip (University of Birmingham) | Lee, Mark (University of Birmingham)
In this paper, we present a Combinatory Categorial Grammar (CCG) based approach to the classification of emotion in microtext. We develop a method that makes use of the notion put forward by Ortony, Clore, and Collins (1988), that emotions are valenced reactions. This hypothesis sits central to our system, in which we adapt contextual valence shifters to infer the emotional content of a text. We integrate this with an augmented version of WordNet-Affect, which acts as our lexicon. Finally, we experiment with a corpus of headlines proposed in the 2007 SemEval Affective Task (Strapparava and Mihalcea 2007) as our microtext corpus, and by taking the other competing systems as a baseline, demonstrate that our approach to emotion categorisation performs favourably.
Normalizing Microtext
Xue, Zhenzhen (Lehigh University) | Yin, Dawei (Lehigh University) | Davison, Brian D. (Lehigh University)
The use of computer mediated communication has resulted in a new form of written text--Microtext--which is very different from well-written text. Tweets and SMS messages, which have limited length and may contain misspellings, slang, or abbreviations, are two typical examples of microtext. Microtext poses new challenges to standard natural language processing tools which are usually designed for well-written text. The objective of this work is to normalize microtext, in order to produce text that could be suitable for further treatment. We propose a normalization approach based on the source channel model, which incorporates four factors, namely an orthographic factor, a phonetic factor, a contextual factor and acronym expansion. Experiments show that our approach can normalize Twitter messages reasonably well, and it outperforms existing algorithms on a public SMS data set.
Learning Ontologies from the Web for Microtext Processing
Galitsky, Boris (University of Girona) | Dobrocsi, Gabor Boris (University of Girona) | Rosa, Josep Lluis de la (University of Girona)
We build a mechanism to form an ontology of entities which improves a relevance of matching and searching microtext. Ontology construction starts from the seed entities and mines the web for new entities associated with them. To form these new entities, machine learning of syntactic parse trees (syntactic generalization) is applied to form commonalities between various search results for existing entities on the web. Ontology and syntactic generalization are applied to relevance improvement in search and text similarity assessment in commercial setting; evaluation results show substantial contribution of both sources to microtext processing.
Analysis of C2 and “C2-Lite” Micro-Message Communications
Duchon, Andrew (Aptima, Inc.) | McCormack, Robert (Aptima, Inc.) | Riordan, Brian (Aptima, Inc.) | Shabarekh, Charlotte (Aptima, Inc.) | Weil, Shawn (Aptima, Inc.) | Yohai, Ian (Aptima, Inc.)
Rather, the goal is to Microtext media (Ellen, 2011), such as SMS, IM, Twitter, gather relevant messages, organize them, and extract some and text chat, have in common that they use short strings other kind of useful information from them, such as how for immediate communication or broadcast. Microtext can well a team is performing or what people are talking about be construed as one form of micro-messaging (e.g., and when. However, micro-messages do not exist in a Milstein, et al., 2008) which we extend here to include any vacuum; they are contextually oriented and may be part of of a number of other modalities (e.g., telephone calls, a larger network of communications which includes email, face-to-face interaction) used for short, immediate and telephone and other media, including "macro-text." Given (potentially) persistent message passing among this, we have found that natural language processing of the coordinating agents. In this paper, we describe several microtext must be paired with temporal or network recent attempts to study micro-messaging military and analysis of the context. To demonstrate this process, we related organizational contexts.