AITopics

Eighth International AAAI Conference on Weblogs and Social Media

Industry:

Education > Educational Setting (0.73)
Information Technology > Services (0.60)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Extraction (0.60)

AAAI ConferencesNov-14-2013

Web Scale Information Extraction with LODIE

Gentile, Anna Lisa (University of Sheffield) | Zhang, Ziqi (University of Sheffield) | Ciravegna, Fabio (University of Sheffield)

Information Extraction (IE) is the technique for transforming unstructured textual data into structured representation that can be understood by machines. The exponential growth of the Web generates an exceptional quantity of data for which automatic knowledge capture is essential. This work describes the methodology for Web scale Information Extraction adopted by the LODIE project (Linked Open Data Information Extraction). LODIE aims to develop Information Extraction techniques able to (i) scale at web level and (ii) adapt to user information need. The core idea behind LODIE is the usage of Linked Open Data, a very large-scale information resource, as a ground-breaking solution for IE, which provides invaluable annotated data on a growing number of domains.

lodie, scale information extraction

2013 AAAI Fall Symposium Series

Technology:

Information Technology > Data Science > Data Mining > Text Mining (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Extraction (1.00)

Detecting and Tracking Disease Outbreaks by Mining Social Media Data

detecting and tracking disease outbreak, mining social media data

The emergence and ubiquity of online social networks have enriched web data with evolving interactions and communities both at mega-scale and in real-time. This data offers an unprecedented opportunity for studying the interaction between society and disease outbreaks. The challenge we describe in this data paper is how to extract and leverage epidemic outbreak insights from massive amounts of social media data and how this exercise can benefit medical professionals, patients, and policymakers alike. We attempt to prepare the research community for this challenge with four datasets. Publishing the four datasets will commoditize the data infrastructure to allow a higher and more efficient focal point for the research community.

Twenty-Third International Joint Conference on Artificial Intelligence

Industry: Health & Medicine > Epidemiology (0.89)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Extraction (0.40)

Xia, Rui (Nanjing University of Science and Technology) | Hu, Xuelei (Nanjing University of Science and Technology) | Lu, Jianfeng (Nanjing University of Science and Technology) | Yang, Jian (Nanjing University of Science and Technology) | Zong, Chengqing (National Laboratory of Pattern Recognition, Institute of Automation)

Instance Selection and Instance Weighting for Cross-Domain Sentiment Classification via PU Learning

Due to the explosive growth of the Internet online reviews, we can easily collect a large amount of labeled reviews from different domains. But only some of them are beneficial for training a desired target-domain sentiment classifier. Therefore, it is important for us to identify those samples that are the most relevant to the target domain and use them as training data. To address this problem, a novel approach, based on instance selection and instance weighting via PU learning, is proposed. PU learning is used at first to learn an in-target-domain selector, which assigns an in-target-domain probability to each sample in the training set. For instance selection, the samples with higher in-target-domain probability are used as training data; For instance weighting, the calibrated in-target-domain probabilities are used as sampling weights for training an instance-weighted naive Bayes model, based on the principle of maximum weighted likelihood estimation. The experimental results prove the necessity and effectiveness of the approach, especially when the size of training data is large. It is also proved that the larger the Kullback-Leibler divergence between the training and test data is, the more effective the proposed approach will be.

cross-domain sentiment classification, pu learning, weighting

Twenty-Third International Joint Conference on Artificial Intelligence

Genre: Research Report (0.53)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Data Science > Data Quality > Instance Selection (0.80)
Information Technology > Artificial Intelligence > Natural Language > Information Extraction (0.69)
Information Technology > Artificial Intelligence > Natural Language > Discourse & Dialogue (0.69)

Moro, Andrea (Sapienza Università di Roma) | Navigli, Roberto (Sapienza Università di Roma)

Integrating Syntactic and Semantic Analysis into the Open Information Extraction Paradigm

integrating syntactic and semantic analysis, open information extraction paradigm

In this paper we present an approach aimed at enriching the Open Information Extraction paradigm with semantic relation ontologization by integrating syntactic and semantic features into its workflow. To achieve this goal, we combine deep syntactic analysis and distributional semantics using a shortest path kernel method and soft clustering. The output of our system is a set of automatically discovered and ontologized semantic relations.

Twenty-Third International Joint Conference on Artificial Intelligence

Technology:

Information Technology > Data Science > Data Mining > Text Mining (0.60)
Information Technology > Artificial Intelligence > Natural Language > Information Extraction (0.60)

Active Learning for Cross-domain Sentiment Classification

Li, Shoushan (Soochow University) | Xue, Yunxia (Soochow University) | Wang, Zhongqing (Soochow University) | Zhou, Guodong (Soochow University)

In the literature, various approaches have been proposedto address the domain adaptation problem in sentiment classification (also called cross-domainsentiment classification). However, the adaptation performance normally much suffers when the data distributionsin the source and target domains differ significantly. In this paper, we suggest to perform activelearning for cross-domain sentiment classification by actively selecting a smallamount of labeled data in the target domain. Accordingly, we propose an novel activelearning approach for cross-domain sentiment classification. First, we traintwo individual classifiers, i.e., the source and target classifiers with thelabeled data from the source and target respectively. Then, the two classifiersare employed to select informative samples with the selection strategy of QueryBy Committee (QBC). Third, the two classifier is combined to make theclassification decision. Importantly, the two classifiers are trained by fullyexploiting the unlabeled data in the target domain with the label propagation(LP) algorithm. Empirical studies demonstrate the effectiveness of our active learning approach for cross-domainsentiment classification over some strong baselines.

active learning, cross-domain sentiment classification

Twenty-Third International Joint Conference on Artificial Intelligence

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Classification (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Extraction (1.00)
Information Technology > Artificial Intelligence > Natural Language > Discourse & Dialogue (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

From Semantic to Emotional Space in Probabilistic Sense Sentiment Analysis

Mohtarami, Mitra (National University of Singapore) | Lan, Man (Institute for Infocomm Research) | Tan, Chew Lim (National University of Singapore)

This paper proposes an effective approach to model the emotional space of words to infer their Sense Sentiment Similarity (SSS). SSS reflects the distance between the words regarding their senses and underlying sentiments. We propose a probabilistic approach that is built on a hidden emotional model in which the basic human emotions are considered as hidden. This leads to predict a vector of emotions for each sense of the words, and then to infer the sense sentiment similarity. The effectiveness of the proposed approach is investigated in two Natural Language Processing tasks: Indirect yes/no Question Answer Pairs Inference and Sentiment Orientation Prediction.

emotion, sentiment similarity, similarity, (15 more...)

Twenty-Seventh AAAI Conference on Artificial Intelligence

Country:

North America > United States > New York (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
Asia > Singapore (0.04)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Cognitive Science > Emotion (0.71)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.70)
Information Technology > Artificial Intelligence > Natural Language > Information Extraction (0.65)
Information Technology > Artificial Intelligence > Natural Language > Discourse & Dialogue (0.65)

Co-Training Based Bilingual Sentiment Lexicon Learning

Gao, Dehong (The Hong Kong Polytechnic University) | Wei, Furu (Microsoft Research Asia, Beijing) | Li, Wenjie (The Hong Kong Polytechnic University) | Liu, Xiaohua (Microsoft Research Asia, Beijing) | Zhou, Ming (Microsoft Research Asia, Beijing)

In this paper, we address the issue of bilingual sentiment lexicon learning(BSLL) which aims to automatically and simultaneously generate sentiment words for two languages. The underlying motivation is that sentiment information from two languages can perform iterative mutual-teaching in the learning procedure. We propose to develop two classifiers to determine the sentiment polarities of words under a co-training framework, which makes full use of the two-view sentiment information from the two languages. The word alignment derived from the parallel corpus is leveraged to design effective features and to bridge the learning of the two classifiers. The experimental results on English and Chinese languages show the effectiveness of our approach in BSLL.

artificial intelligence, bilingual sentiment lexicon learning, natural language, (1 more...)

Workshops at the Twenty-Seventh AAAI Conference on Artificial Intelligence

Technology:

Information Technology > Artificial Intelligence > Natural Language > Information Extraction (0.89)
Information Technology > Artificial Intelligence > Natural Language > Discourse & Dialogue (0.89)

Budlong, Emily (Air Force Research Laboratory (AFRL)) | Pine, Carrie (Air Force Research Laboratory (AFRL)) | Zappavigna, Mark (Air Force Research Laboratory (AFRL)) | Homer, James (National Air and Space Intelligence Center (NASIC)) | Proefrock, Charles (General Dynamic Information Technology (GDIT)) | Gucwa, John (General Dynamic Information Technology (GDIT)) | Crystal, Michael (Raytheon BBN Technologies) | Weischedel, Ralph (Raytheon BBN Technologies)

Interactive Information Extraction and Navigation to Enable Effective Link Analysis and Visualization of Unstructured Text

This paper describes the Advanced Text Exploitation Assistant (ATEA), a system developed to enable intelligence analysts to perform link analysis and visualization (A&V) from information in large volumes of unstructured text. One of the key design challenges that had to be addressed was that of imperfect Information Extraction (IE) technology. While IE seems like a promising candidate for exploiting information in unstructured text, it makes mistakes. As a result, analysts do not trust its results. In this paper, we discuss how ATEA overcomes the obstacle of imperfect IE by incorporating a human-in-the-loop for review and correction of extraction results. We also discuss how coupling consolidated extraction results (corpus-level information objects) with an intuitive user interface facilitates interactive navigation of the resulting information. With these key features, ATEA enables effective link analysis and visualization of information in unstructured text.

data mining, interactive information extraction and navigation, natural language, (6 more...)

Twenty-Fifth IAAI Conference

Technology:

Information Technology > Information Management (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Data Science > Data Mining > Text Mining (0.60)
Information Technology > Artificial Intelligence > Natural Language > Information Extraction (0.60)

Exploring the Contribution of Unlabeled Data in Financial Sentiment Analysis

Ren, Jimmy SJ. (City University of Hong Kong) | Wang, Wei (City University of Hong Kong) | Wang, Jiawei (USTC-CityU Joint Advanced Research Centre) | Liao, Stephen (City University of Hong Kong)

With the proliferation of its applications in various industries, sentiment analysis by using publicly available web data has become an active research area in text classification during these years. It is argued by researchers that semi-supervised learning is an effective approach to this problem since it is capable to mitigate the manual labeling effort which is usually expensive and time-consuming. However, there was a long-term debate on the effectiveness of unlabeled data in text classification. This was partially caused by the fact that many assumptions in theoretic analysis often do not hold in practice. We argue that this problem may be further understood by adding an additional dimension in the experiment. This allows us to address this problem in the perspective of bias and variance in a broader view. We show that the well-known performance degradation issue caused by unlabeled data can be reproduced as a subset of the whole scenario. We argue that if the bias-variance trade-off is to be better balanced by a more effective feature selection method unlabeled data is very likely to boost the classification performance. We then propose a feature selection framework in which labeled and unlabeled training samples are both considered. We discuss its potential in achieving such a balance. Besides, the application in financial sentiment analysis is chosen because it not only exemplifies an important application, the data possesses better illustrative power as well. The implications of this study in text classification and financial sentiment analysis are both discussed.

financial sentiment analysis, machine learning, natural language, (4 more...)

Twenty-Seventh AAAI Conference on Artificial Intelligence

Technology:

Information Technology > Artificial Intelligence > Natural Language > Information Extraction (1.00)
Information Technology > Artificial Intelligence > Natural Language > Discourse & Dialogue (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Unsupervised or Indirectly Supervised Learning (1.00)