AITopics

In this paper, we systematically explore feature definition and selection strategies for sentiment polarity classification. We begin by exploring basic questions, such as whether to use stemming, term frequency versus binary weighting, negation-enriched features, n-grams or phrases. We then move onto more complex aspects including feature selection using frequency-based vocabulary trimming, part-of-speech and lexicon selection (three types of lexicons), as well as using expected Mutual Information (MI). Using three product and movie review datasets of various sizes, we show, for example, that some techniques are more beneficial for larger datasets than the smaller. A classifier trained on only few features ranked high by MI outperformed one trained on all features in large datasets, yet in small dataset this did not prove to be true. Finally, we perform a space and computation cost analysis to further understand the merits of various feature types.

artificial intelligence, machine learning, natural language, (14 more...)

Fifth International AAAI Conference on Weblogs and Social Media

Country:

North America > United States > Iowa > Johnson County > Iowa City (0.14)
North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Extraction (0.84)
Information Technology > Artificial Intelligence > Natural Language > Discourse & Dialogue (0.84)

Twitter Sentiment Analysis: The Good the Bad and the OMG!

Kouloumpis, Efthymios (i-sieve Technologies) | Wilson, Theresa (Johns Hopkins University) | Moore, Johanna (University of Edinburgh)

In this paper, we investigate the utility of linguistic features for detecting the sentiment of Twitter messages. We evaluate the usefulness of existing lexical resources as well as features that capture information about the informal and creative language used in microblogging. We take a supervied approach to the problem, but leverage existing hashtags in the Twitter data for building training data.

artificial intelligence, natural language, training data, (18 more...)

Fifth International AAAI Conference on Weblogs and Social Media

Country:

North America > United States > Maryland > Baltimore (0.04)
North America > United States > Idaho (0.04)
Europe > Greece > Attica > Athens (0.04)

Industry: Information Technology > Services (0.88)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Extraction (1.00)
Information Technology > Artificial Intelligence > Natural Language > Discourse & Dialogue (1.00)

Creating Conversations: An Automated Dialog System

Gandy, Lisa (Northwestern University) | Hammond, Kristian (Northwestern University)

Online news sites often include a comments section where readers are allowed to leave their thoughts. These comments often contain interesting and insightful conversations between readers about the news article. However the richness of these conversations is often lost among other meaningless comments, and moreover all comments are found at the bottom of the web page. In this article, we discuss how our system inserts reader conversations into the news article to create a multimedia presentation called Shout Out. Shout Out features two virtual news anchors: one anchor reads the news and when appropriate the anchor pauses to have a conversation about the news with another anchor. This current iteration of Shout Out combines natural language techniques and reader conversations to create an engaging system.

artificial intelligence, machine learning, natural language, (19 more...)

Fifth International AAAI Conference on Weblogs and Social Media

Country:

North America > United States > Virginia > Arlington County > Arlington (0.04)
North America > United States > Illinois > Cook County > Evanston (0.04)
North America > United States > Hawaii (0.04)
(2 more...)

Genre: Questionnaire & Opinion Survey (1.00)

Industry: Media > News (1.00)

Technology:

Information Technology > Communications > Social Media (0.96)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.47)
Information Technology > Artificial Intelligence > Natural Language > Discourse & Dialogue (0.40)

A Machine Learning Approach to Twitter User Classification

Pennacchiotti, Marco (Yahoo! Labs) | Popescu, Ana-Maria (Yahoo! Labs)

This paper addresses the task of user classification in social media, with an application to Twitter. We automatically infer the values of user attributes such as political orientation or ethnicity by leveraging observable information such as the user behavior, network structure and the linguistic content of the user’s Twitter feed. We employ a machine learning approach which relies on a comprehensive set of features derived from such user information. We report encouraging experimental results on 3 tasks with different characteristics: political affiliation detection, ethnicity identification and detecting affinity for a particular business. Finally, our analysis shows that rich linguistic features prove consistently valuable across the 3 tasks and show great promise for additional user classification needs.

artificial intelligence, machine learning, natural language, (18 more...)

Fifth International AAAI Conference on Weblogs and Social Media

Country:

Asia > Middle East > Jordan (0.04)
North America > United States > California > Santa Clara County > Sunnyvale (0.04)

Industry:

Information Technology > Services (0.47)
Government > Regional Government > North America Government > United States Government (0.46)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Discourse & Dialogue (0.47)

Generate Adjective Sentiment Dictionary for Social Media Sentiment Analysis Using Constrained Nonnegative Matrix Factorization

Peng, Wei (Xerox) | Park, Dae Hoon (University of Illinois at Urbana-Champaign)

Although sentiment analysis has attracted a lot of research, little work has been done on social media data compared to product and movie reviews. This is due to the low accuracy that results from the more informal writing seen in social media data. Currently, most of sentiment analysis tools on social media choose the lexicon-based approach instead of the machine learning approach because the latter requires the huge challenge of obtaining enough human-labeled training data for extremely large-scale and diverse social opinion data. The lexicon-based approach requires a sentiment dictionary to determine opinion polarity. This dictionary can also provide useful features for any supervised learning method of the machine learning approach. However, many benchmark sentiment dictionaries do not cover the many informal and spoken words used in social media. In addition, they are not able to update frequently to include newly generated words online. In this paper, we present an automatic sentiment dictionary generation method, called Constrained Symmetric Nonnegative Matrix Factorization (CSNMF) algorithm, to assign polarity scores to each word in the dictionary, on a large social media corpus — digg.com. Moreover, we will demonstrate our study of Amazon Mechanical Turk (AMT) on social media word polarity, using both the human-labeled dictionaries from AMT and the General Inquirer Lexicon to compare our generated dictionary with. In our experiment, we show that combining links from both WordNet and the corpus to generate sentiment dictionaries does outperform using only one of them, and the words with higher sentiment scores yield better precision. Finally, we conducted a lexicon-based sentiment analysis on human-labeled social comments using our generated sentiment dictionary to show the effectiveness of our method.

artificial intelligence, machine learning, natural language, (20 more...)

Fifth International AAAI Conference on Weblogs and Social Media

Country:

Asia > Middle East > Jordan (0.04)
North America > United States > New York > Monroe County > Rochester (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
North America > United States > Illinois > Champaign County > Urbana (0.04)

Genre: Research Report > New Finding (1.00)

Industry:

Media (0.34)
Leisure & Entertainment (0.34)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Extraction (1.00)
(2 more...)

arXiv.org Artificial IntelligenceJun-9-2011

Automatically Training a Problematic Dialogue Predictor for a Spoken Dialogue System

Gorin, A., Langkilde-Geary, I., Walker, M. A., Wright, J., Hastie, H. Wright

Spoken dialogue systems promise efficient and natural access to a large variety of information sources and services from any phone. However, current spoken dialogue systems are deficient in their strategies for preventing, identifying and repairing problems that arise in the conversation. This paper reports results on automatically training a Problematic Dialogue Predictor to predict problematic human-computer dialogues using a corpus of 4692 dialogues collected with the 'How May I Help You' (SM) spoken dialogue system. The Problematic Dialogue Predictor can be immediately applied to the system's decision of whether to transfer the call to a human customer care agent, or be used as a cue to the system's dialogue manager to modify its behavior to repair problems, and even perhaps, to prevent them. We show that a Problematic Dialogue Predictor using automatically-obtainable features from the first two exchanges in the dialogue can predict problematic dialogues 13.2% more accurately than the baseline.

artificial intelligence, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

doi: 10.1613/jair.971

1106.1817

Country: North America > United States (0.28)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Discourse & Dialogue (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.46)

arXiv.org Artificial IntelligenceJun-3-2011

Optimizing Dialogue Management with Reinforcement Learning: Experiments with the NJFun System

Kearns, M., Litman, D., Singh, S., Walker, M.

Designing the dialogue policy of a spoken dialogue system involves many nontrivial choices. This paper presents a reinforcement learning approach for automatically optimizing a dialogue policy, which addresses the technical challenges in applying reinforcement learning to a working dialogue system with human users. We report on the design, construction and empirical evaluation of NJFun, an experimental spoken dialogue system that provides users with access to information about fun things to do in New Jersey. Our results show that by optimizing its performance via reinforcement learning, NJFun measurably improves system performance.

machine learning, natural language, reinforcement learning, (5 more...)

arXiv.org Artificial Intelligence

doi: 10.1613/jair.859

1106.0676

Country: North America > United States > New Jersey (0.24)

Genre: Research Report > New Finding (0.53)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Discourse & Dialogue (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

arXiv.org Artificial IntelligenceJun-1-2011

An Application of Reinforcement Learning to Dialogue Strategy Selection in a Spoken Dialogue System for Email

Walker, M. A.

This paper describes a novel method by which a spoken dialogue system can learn to choose an optimal dialogue strategy from its experience interacting with human users. The method is based on a combination of reinforcement learning and performance modeling of spoken dialogue systems. The reinforcement learning component applies Q-learning (Watkins, 1989), while the performance modeling component applies the PARADISE evaluation framework (Walker et al., 1997) to learn the performance function (reward) used in reinforcement learning. We illustrate the method with a spoken dialogue system named ELVIS (EmaiL Voice Interactive System), that supports access to email over the phone. We conduct a set of experiments for training an optimal dialogue strategy on a corpus of 219 dialogues in which human users interact with ELVIS over the phone. We then test that strategy on a corpus of 18 dialogues. We show that ELVIS can learn to optimize its strategy selection for agent initiative, for reading messages, and for summarizing email folders.

machine learning, natural language, reinforcement learning, (17 more...)

arXiv.org Artificial Intelligence

doi: 10.1613/jair.713

1106.0241

Country:

North America > Canada > Ontario > Toronto (0.14)
North America > United States > Pennsylvania (0.04)
North America > United States > Rhode Island > Providence County > Providence (0.04)
(5 more...)

Genre: Research Report > New Finding (0.46)

Industry: Telecommunications (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Discourse & Dialogue (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)

AAAI ConferencesMay-18-2011

Adding Abstractive Reflection to a Tutorial Dialog System

Ward, Arthur (University of Pittsburgh) | Litman, Diane (University of Pittsburgh)

In this work we hypothesize that giving students a reflective reading after spoken dialog tutoring in qualitative physics will improve learning. The reading is designed to help students compare similar aspects of previously tutored problems, and to abstract their commonalities. We also hypothesize that student motivation will affect how well the text is processed, and so influence learning. We find that the beneficial effects of the reflective text significantly interact with motivation, such that moderately motivated students learn significantly more from the reflective text than from a non-reflective control text. More poorly or highly motivated students did not benefit from reflective text. These results demonstrate that implicit reflection can improve learning after dialog tutoring with a qualitative physics tutor. They further demonstrate that this result can be obtained with a reflective/abstractive text without recourse to dialog, and that the effectiveness of the text is sensitive to the motivation level of the student.

motivation, reflection, student, (17 more...)

Twenty-Fourth International FLAIRS Conference

Country:

North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.05)
Asia > Middle East > Jordan (0.05)
North America > United States > Virginia > Chesapeake (0.04)
(2 more...)

Genre: Research Report > New Finding (0.88)

Industry: Education > Educational Technology > Educational Software > Computer Based Training (0.85)

Technology: Information Technology > Artificial Intelligence > Natural Language > Discourse & Dialogue (0.83)

Wang, William Yang (Columbia University) | Artstein, Ron (USC Institute for Creative Technologies) | Leuski, Anton (USC Institute for Creative Technologies) | Traum, David (USC Institute for Creative Technologies)

Improving Spoken Dialogue Understanding Using Phonetic Mixture Models

AAAI ConferencesMay-18-2011

Augmenting word tokens with a phonetic representation, derived from a dictionary, improves the performance of a Natural Language Understanding component that interprets speech recognizer output: we observed a 5% to 7% reduction in errors across a wide range of response return rates. The best performance comes from mixture models incorporating both word and phone features. Since the phonetic representation is derived from a dictionary, the method can be applied easily without the need for integration with a specific speech recognizer. The method has similarities with autonomous (or bottom-up) psychological models of lexical access, where contextual information is not integrated at the stage of auditory perception but rather later.

language model, tokenizer, utterance, (15 more...)

Twenty-Fourth International FLAIRS Conference

Country:

North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
North America > United States > New York > Monroe County > Rochester (0.04)
North America > United States > California > San Francisco County > San Francisco (0.04)
(3 more...)

Genre: Research Report (0.46)

Industry: Government (0.69)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Discourse & Dialogue (1.00)
Information Technology > Artificial Intelligence > Speech > Speech Recognition (0.93)