Goto

Collaborating Authors

 Genre


Transfer Learning for Multiple-Domain Sentiment Analysis — Identifying Domain Dependent/Independent Word Polarity

AAAI Conferences

Sentiment analysis is the task of determining the attitude (positive or negative) of documents. While the polarity of words in the documents is informative for this task, polarity of some words cannot be determined without domain knowledge. Detecting word polarity thus poses a challenge for multiple-domain sentiment analysis. Previous approaches tackle this problem with transfer learning techniques, but they cannot handle multiple source domains and multiple target domains. This paper proposes a novel Bayesian probabilistic model to handle multiple source and multiple target domains. In this model, each word is associated with three factors: Domain label, domain dependence/independence and word polarity. We derive an efficient algorithm using Gibbs sampling for inferring the parameters of the model, from both labeled and unlabeled texts. Using real data, we demonstrate the effectiveness of our model in a document polarity classification task compared with a method not considering the differences between domains. Moreover our method can also tell whether each word's polarity is domain-dependent or domain-independent. This feature allows us to construct a word polarity dictionary for each domain.


Temporal Dynamics of User Interests in Tagging Systems

AAAI Conferences

Collaborative tagging systems are now deployed extensivelyto help users share and organize resources.Tag prediction and recommendation systems generallymodel user behavior as research has shown that accuracycan be significantly improved by modeling users’preferences. However, these preferences are usuallytreated as constant over time, neglecting the temporalfactor within users’ interests. On the other hand, littleis known about how this factor may influence predictionin social bookmarking systems. In this paper, weinvestigate the temporal dynamics of user interests intagging systems and propose a user-tag-specific temporalinterests model for tracking users’ interests overtime. Additionally, we analyze the phenomenon of topicswitches in social bookmarking systems, showing that atemporal interests model can benefit from the integrationof topic switch detection and that temporal characteristicsof social tagging systems are different fromtraditional concept drift problems. We conduct experimentson three public datasets, demonstrating the importanceof personalization and user-tag specializationin tagging systems. Experimental results show that ourmethod can outperform state-of-the-art tag predictionalgorithms. We also incorporate our model within existingcontent-based methods yielding significant improvementsin performance.


Analyzing and Predicting Not-Answered Questions in Community-based Question Answering Services

AAAI Conferences

This paper focuses on analyzing and predicting not-answered questions in Community based Question Answering (CQA) services, such as Yahoo! Answers. In CQA services, users express their information needs by submitting natural language questions and await answers from other human users. Comparing to receiving results from web search engines using keyword queries, CQA users are likely to get more specific answers, because human answerers may catch the main point of the question. However, one of the key problems of this pattern is that sometimes no one helps to give answers, while web search engines hardly fail to response. In this paper, we analyze the not-answered questions and give a first try of predicting whether questions will receive answers. More specifically, we first analyze the questions of Yahoo Answers based on the features selected from different perspectives. Then, we formalize the prediction problem as supervised learning – binary classification problem and leverage the proposed features to make predictions. Extensive experiments are made on 76,251 questions collected from Yahoo! Answers. We analyze the specific characteristics of not-answered questions and try to suggest possible reasons for why a question is not likely to be answered. As for prediction, the experimental results show that classification based on the proposed features outperforms the simple word-based approach significantly.


Predicting Author Blog Channels with High Value Future Posts for Monitoring

AAAI Conferences

The phenomenal growth of social media, both in scale and importance, has created a unique opportunity to track information diffusion and the spread of influence, but can also make efficient tracking difficult. Given data streams representing blog posts on multiple blog channels and a focal query post on some topic of interest, our objective is to predict which of those channels are most likely to contain a future post that is relevant, or similar, to the focal query post. We denote this task as the future author prediction problem (FAPP). This problem has applications in information diffusion for brand monitoring and blog channel personalization and recommendation. We develop prediction methods inspired by (naive) information retrieval approaches that use historical posts in the blog channel for prediction. We also train a ranking support vector machine (SVM) to solve the problem. We evaluate our methods on an extensive social media dataset; despite the difficulty of the task, all methods perform reasonably well. Results show that ranking SVM prediction can exploit blog channel and diffusion characteristics to improve prediction accuracy. Moreover, it is surprisingly good for prediction in emerging topics and identifying inconsistent authors.


Integrating Community Question and Answer Archives

AAAI Conferences

Question and answer pairs in Community Question Answering (CQA) services are organized into hierarchical structures or taxonomies to facilitate users to find the answers for their questions conveniently. We observe that different CQA services have their own knowledge focus and used different taxonomies to organize their question and answer pairs in their archives. As there are no simple semantic mappings between the taxonomies of the CQA services, the integration of CQA services is a challenging task. The existing approaches on integrating taxonomies ignore the hierarchical structures of the source taxonomy. In this paper, we propose a novel approach that is capable of incorporating the parent-child and sibling information in the hierarchical structures of the source taxonomy for accurate taxonomy integration. Our experimental results with real world CQA data demonstrate that the proposed method significantly outperforms state-of-the-art methods.


Cross-Language Latent Relational Search: Mapping Knowledge across Languages

AAAI Conferences

Latent relational search (LRS) is a novel approach for mapping knowledge across two domains. Given a source domain knowledge concerning the Moon, "The Moon is a satellite of the Earth," one can form a question {(Moon, Earth), (Ganymede, ?)} to query an LRS engine for new knowledge in the target domain concerning the Ganymede. An LRS engine relies on some supporting sentences such as ``Ganymede is a natural satellite of Jupiter.'' to retrieve and rank "Jupiter" as the first answer. This paper proposes cross-language latent relational search (CLRS) to extend the knowledge mapping capability of LRS from cross-domain knowledge mapping to cross-domain and cross-language knowledge mapping. In CLRS, the supporting sentences for the source pair might be in a different language with that of the target pair. We represent the relation between two entities in an entity pair by lexical patterns of the context surrounding the two entities. We then propose a novel hybrid lexical pattern clustering algorithm to capture the semantic similarity between paraphrased lexical patterns across languages. Experiments on Japanese-English datasets show that the proposed method achieves an MRR of 0.579 for CLRS task, which is comparable to the MRR of an existing monolingual LRS engine.


Generating True Relevance Labels in Chinese Search Engine Using Clickthrough Data

AAAI Conferences

In current search engines, ranking functions are learned from a large number of labeled <query, URL> pairs in which the labels are assigned by human judges, describing how well the URLs match the different queries. However in commercial search engines, collecting high quality labels is time-consuming and labor-intensive. To tackle this issue, this paper studies how to produce the true relevance labels for  <query, URL> pairs using clickthrough data. By analyzing the correlations between query frequency, true relevance labels and users’ behaviors, we demonstrate that the users who search the queries with similar frequency have similar search intents and behavioral characteristics. Based on such properties, we propose an efficient discriminative parameter estimation in a multiple instance learning algorithm (MIL) to automatically produce true relevance labels for  <query, URL> pairs. Furthermore, we test our approach using a set of real world data extracted from a Chinese commercial search engine. Experimental results not only validate the effectiveness of the proposed approach, but also indicate that our approach is more likely to agree with the aggregation of the multiple judgments when strong disagreements exist in the panel of judges. In the event that the panel of judges is consensus, our approach provides more accurate automatic label results. In contrast with other models, our approach effectively improves the correlation between automatic labels and manual labels.


Fast Query Recommendation by Search

AAAI Conferences

Query recommendation can not only effectively facilitate users to obtain their desired information but alsoincrease ads’ click-through rates. This paper presentsa general and highly efficient method for query recommendation. Given query sessions, we automatically generate many similar and dissimilar query-pairs as the prior knowledge. Then we learn a transformation from the prior knowledge to move similar queries closer such that similar queries tend to have similar hash values.This is formulated as minimizing the empirical error on the prior knowledge while maximizing the gap between the data and some partition hyperplanes randomly generated in advance. In the recommendation stage, we search queries that have similar hash values to the given query, rank the found queries and return the top K queries as the recommendation result. All the experimental results demonstrate that our method achieves encouraging results in terms of efficiency and recommendation performance.


Active Dual Collaborative Filtering with Both Item and Attribute Feedback

AAAI Conferences

The new user problem (aka user cold start) is very common in online recommender systems. Active collaborative filtering (active CF) tries to solve this problem by intelligently soliciting user feedback in order to build an initial user profile with minimal costs. Existing methods only query the user for feedback on items, while users can have preferences over items as well as certain item attributes. In this paper, we extend active CF via user feedback on both items and attributes. For example, when making movie recommendations, the system can ask users for not only their favorite movies, but also attributes such as genres, actors, etc. We design a unified active CF framework for incorporating both item and attribute feedback based on the random walk model. We test the active CF algorithm on real-world movie recommendation data sets to demonstrate that appropriately querying for both item and feature feedback can significantly reduce the overall user effort measured in terms of number of queries. We show that we can achieve much better recommendation quality as compared to traditional active CF methods that support only item feedback.


Commonsense Causal Reasoning Using Millions of Personal Stories

AAAI Conferences

The personal stories that people write in their Internet weblogs include a substantial amount of information about the causal relationships between everyday events. In this paper we describe our efforts to use millions of these stories for automated commonsense causal reasoning. Casting the commonsense causal reasoning problem as a Choice of Plausible Alternatives, we describe four experiments that compare various statistical and information retrieval approaches to exploit causal information in story corpora. The top performing system in these experiments uses a simple co-occurrence statistic between words in the causal antecedent and consequent, calculated as the Pointwise Mutual Information between words in a corpus of millions of personal stories.