Goto

Collaborating Authors

 Discourse & Dialogue


SenticNet 3: A Common and Common-Sense Knowledge Base for Cognition-Driven Sentiment Analysis

AAAI Conferences

SenticNet is a publicly available semantic and affective resource for concept-level sentiment analysis. Rather than using graph-mining and dimensionality-reduction techniques, SenticNet 3 makes use of "energy flows" to connect various parts of extended common and common-sense knowledge representations to one another. SenticNet 3 models nuanced semantics and sentics (that is, the conceptual and affective information associated with multi-word natural language expressions), representing information with a symbolic opacity of an intermediate nature between that of neural networks and typical symbolic systems.


User Intent Identification from Online Discussions Using a Joint Aspect-Action Topic Model

AAAI Conferences

Online discussions are growing as a popular, effective and reliable source of information for users because of their liveliness, flexibility and up-to-date information. Online discussions are usually developed and advanced by groups of users with various backgrounds and intents. However because of their diversities in topics and issues discussed by the users, supervised methods are not able to accurately model such dynamic conditions. In this paper, we propose a novel unsupervised generative model to derive aspect-action pairs from online discussions. The proposed method simultaneously captures and models these two features with their relationships that exist in each thread. We assume that each user post is generated by a mixture of aspect and action topics. Therefore, we design a model that captures the latent factors that incorporates the aspect types and intended actions, which describe how users develop a topic in a discussion. In order to demonstrate the effectiveness of our approach, we empirically compare our model against the state of the art methods on large-scale discussion dataset, crawled from apple discussions with over 3.3 million user posts from 340k discussion threads.


Role-Aware Conformity Modeling and Analysis in Social Networks

AAAI Conferences

Conformity is the inclination of a person to be influenced by others. In this paper, we study how the conformity tendency of a person changes with her role, as defined by her structural properties in a social network. We first formalize conformity using a utility function based on the conformity theory from social psychology, and validate the proposed utility function by proving the existence of Nash Equilibria when all users in a network behave according to it. We then extend and incorporate the utility function into a probabilistic topic model, called the Role-Conformity Model (RCM), for modeling user behaviors under the effect of conformity. We apply the proposed RCM to several academic research networks, and discover that people with higher degree and lower clustering coefficient are more likely to conform to others. We also evaluate RCM through the task of word usage prediction in academic publications, and show significant improvements over baseline models.


Emotion Classification in Microblog Texts Using Class Sequential Rules

AAAI Conferences

This paper studies the problem of emotion classification in microblog texts. Given a microblog text which consists of several sentences, we classify its emotion as anger, disgust, fear, happiness, like, sadness or surprise if available. Existing methods can be categorized as lexicon based methods or machine learning based methods. However, due to some intrinsic characteristics of the microblog texts, previous studies using these methods always get unsatisfactory results. This paper introduces a novel approach based on class sequential rules for emotion classification of microblog texts. The approach first obtains two potential emotion labels for each sentence in a microblog text by using an emotion lexicon and a machine learning approach respectively, and regards each microblog text as a data sequence. It then mines class sequential rules from the dataset and finally derives new features from the mined rules for emotion classification of microblog texts. Experimental results on a Chinese benchmark dataset show the superior performance of the proposed approach.


Plan and Activity Recognition from a Topic Modeling Perspective

AAAI Conferences

We examine new ways to perform plan recognition (PR) using natural language processing (NLP) techniques. PR often focuses on the structural relationships between consecutive observations and ordered activities that comprise plans. However, NLP commonly treats text as a bag-of-words, omitting such structural relationships and using topic models to break down the distribution of concepts discussed in documents. In this paper, we examine an analogous treatment of plans as distributions of activities. We explore the application of Latent Dirichlet Allocation topic models to human skeletal data of plan execution traces obtained from a RGB-D sensor. This investigation focuses on representing the data as text and interpreting learned activities as a form of activity recognition (AR). Additionally, we explain how the system may perform PR. The initial empirical results suggest that such NLP methods can be useful in complex PR and AR tasks.


A Topic Model Approach to Multi-Modal Similarity

arXiv.org Machine Learning

Calculating similarities between objects defined by many heterogeneous data modalities is an important challenge in many multimedia applications. We use a multi-modal topic model as a basis for defining such a similarity between objects. We propose to compare the resulting similarities from different model realizations using the non-parametric Mantel test. The approach is evaluated on a music dataset.


Topic words analysis based on LDA model

arXiv.org Machine Learning

Social network analysis (SNA), which is a research field describing and modeling the social connection of a certain group of people, is popular among network services. Our topic words analysis project is a SNA method to visualize the topic words among emails from Obama.com to accounts registered in Columbus, Ohio. Based on Latent Dirichlet Allocation (LDA) model, a popular topic model of SNA, our project characterizes the preference of senders for target group of receptors. Gibbs sampling is used to estimate topic and word distribution. Our training and testing data are emails from the carbon-free server Datagreening.com. We use parallel computing tool BashReduce for word processing and generate related words under each latent topic to discovers typical information of political news sending specially to local Columbus receptors. Running on two instances using paralleling tool BashReduce, our project contributes almost 30% speedup processing the raw contents, comparing with processing contents on one instance locally. Also, the experimental result shows that the LDA model applied in our project provides precision rate 53.96% higher than TF-IDF model finding target words, on the condition that appropriate size of topic words list is selected.


Common and Common-Sense Knowledge Integration for Concept-Level Sentiment Analysis

AAAI Conferences

In the era of Big Data, knowledge integration is key for tasks such as social media aggregation, opinion mining, and cyber-issue detection. The integration of different kinds of knowledge coming from multiple sources, however, is often a problematic issue as it either requires a lot of manual effort in defining aggregation rules or suffers from noise generated by automatic integration techniques. In this work, we propose a method based on conceptual primitives for efficiently integrating pieces of knowledge coming from different common and common-sense resources, which we test in the field of concept-level sentiment analysis.


SMART Electronic Legal Discovery Via Topic Modeling

AAAI Conferences

Electronic discovery is an interesting subproblem of information retrieval in which one identifies documents that are potentially relevant to issues and facts of a legal case from an electronically stored document collection (a corpus). In this paper, we consider representing documents in a topic space using the well-known topic models such as latent Dirichlet allocation and latent semantic indexing, and solving the information retrieval problem via finding document similarities in the topic space rather doing it in the corpus vocabulary space. We also develop an iterative SMART ranking and categorization framework including human-in-the-loop to label a set of seed (training) documents and using them to build a semi-supervised binary document classification model based on Support Vector Machines. To improve this model, we propose a method for choosing seed documents from the whole population via an active learning strategy. We report the results of our experiments on a real dataset in the electronic discovery domain.


A Survey of Data Mining Techniques for Social Media Analysis

arXiv.org Artificial Intelligence

Social network has gained remarkable attention in the last decade. Accessing social network sites such as Twitter, Facebook LinkedIn and Google+ through the internet and the web 2.0 technologies has become more affordable. People are becoming more interested in and relying on social network for information, news and opinion of other users on diverse subject matters. The heavy reliance on social network sites causes them to generate massive data characterised by three computational issues namely; size, noise and dynamism. These issues often make social network data very complex to analyse manually, resulting in the pertinent use of computational means of analysing them. Data mining provides a wide range of techniques for detecting useful knowledge from massive datasets like trends, patterns and rules [44]. Data mining techniques are used for information retrieval, statistical modelling and machine learning. These techniques employ data pre-processing, data analysis, and data interpretation processes in the course of data analysis. This survey discusses different data mining techniques used in mining diverse aspects of the social network over decades going from the historical techniques to the up-to-date models, including our novel technique named TRCM. All the techniques covered in this survey are listed in the Table.1 including the tools employed as well as names of their authors.