AITopics | Discourse & Dialogue

Collaborating Authors

Discourse & Dialogue

Understanding Language in Conversations "The problems addressed in discourse research aim to answer two general kinds of questions: (1) what information is contained in extended sequences of utterances that goes beyond the meaning of the individual utterances themselves? (2) how does the context in which an utterance is used affect the meaning of the individual utterances, or parts of them?"
– Barbara Grosz. Overview of Chapter 6: Discourse and Dialogue, Survey of the State of the Art in Human Language Technology (1996).

News Overviews Instructional Materials AI-Alerts Classics

Tag-Weighted Topic Model For Large-scale Semi-Structured Documents

Li, Shuangyin, Li, Jiefei, Huang, Guan, Tan, Ruiyang, Pan, Rong

arXiv.org Machine LearningJul-30-2015

To date, there have been massive Semi-Structured Documents (SSDs) during the evolution of the Internet. These SSDs contain both unstructured features (e.g., plain text) and metadata (e.g., tags). Most previous works focused on modeling the unstructured text, and recently, some other methods have been proposed to model the unstructured text with specific tags. To build a general model for SSDs remains an important problem in terms of both model fitness and efficiency. We propose a novel method to model the SSDs by a so-called Tag-Weighted Topic Model (TWTM). TWTM is a framework that leverages both the tags and words information, not only to learn the document-topic and topic-word distributions, but also to infer the tag-topic distributions for text mining tasks. We present an efficient variational inference method with an EM algorithm for estimating the model parameters. Meanwhile, we propose three large-scale solutions for our model under the MapReduce distributed computing platform for modeling large-scale SSDs. The experimental results show the effectiveness, efficiency and the robustness by comparing our model with the state-of-the-art methods in document modeling, tags prediction and text classification. We also show the performance of the three distributed solutions in terms of time and accuracy on document modeling.

artificial intelligence, machine learning, natural language, (20 more...)

arXiv.org Machine Learning

1507.08396

Country:

Asia (0.46)
North America > United States (0.28)

Genre:

Research Report > Promising Solution (0.54)
Research Report > New Finding (0.34)

Industry:

Education (0.67)
Media (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Discourse & Dialogue (0.87)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.86)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.46)

Add feedback

Document Embedding with Paragraph Vectors

Dai, Andrew M., Olah, Christopher, Le, Quoc V.

arXiv.org Artificial IntelligenceJul-28-2015

Paragraph Vectors has been recently proposed as an unsupervised method for learning distributed representations for pieces of texts. In their work, the authors showed that the method can learn an embedding of movie review texts which can be leveraged for sentiment analysis. That proof of concept, while encouraging, was rather narrow. Here we consider tasks other than sentiment analysis, provide a more thorough comparison of Paragraph Vectors to other document modelling algorithms such as Latent Dirichlet Allocation, and evaluate performance of the method as we vary the dimensionality of the learned representation. We benchmarked the models on two document similarity data sets, one from Wikipedia, one from arXiv. We observe that the Paragraph Vector method performs significantly better than other methods, and propose a simple improvement to enhance embedding quality. Somewhat surprisingly, we also show that much like word embeddings, vector operations on Paragraph Vectors can perform useful semantic results.

artificial intelligence, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

1507.07998

Country:

North America > United States > California (0.28)
Asia (0.28)

Genre: Research Report (0.82)

Industry:

Leisure & Entertainment (0.91)
Media > Music (0.30)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Natural Language > Discourse & Dialogue (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.96)

Add feedback

Incremental Variational Inference for Latent Dirichlet Allocation

Archambeau, Cedric, Ermis, Beyza

arXiv.org Machine LearningJul-22-2015

We introduce incremental variational inference and apply it to latent Dirichlet allocation (LDA). Incremental variational inference is inspired by incremental EM and provides an alternative to stochastic variational inference. Incremental LDA can process massive document collections, does not require to set a learning rate, converges faster to a local optimum of the variational bound and enjoys the attractive property of monotonically increasing it. We study the performance of incremental LDA on large benchmark data sets. We further introduce a stochastic approximation of incremental variational inference which extends to the asynchronous distributed setting. The resulting distributed algorithm achieves comparable performance as single host incremental variational inference, but with a significant speed-up.

artificial intelligence, machine learning, natural language, (15 more...)

arXiv.org Machine Learning

1507.05016

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.94)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.86)
Information Technology > Artificial Intelligence > Natural Language > Discourse & Dialogue (0.73)

Add feedback

A Subspace Learning Framework for Cross-Lingual Sentiment Classification with Partial Parallel Data

Zhou, Guangyou (Central China Normal University) | He, Tingting (Central China Normal University) | Zhao, Jun (National Laboratory of Pattern Recognition, CASIA) | Wu, Wensheng (University of Southern California)

AAAI ConferencesJul-15-2015

Cross-lingual sentiment classification aims to automatically predict sentiment polarity (e.g., positive or negative) of data in a label-scarce target language by exploiting labeled data from a label-rich language. The fundamental challenge of cross-lingual learning stems from a lack of overlap between the feature spaces of the source language data and that of the target language data. To address this challenge, previous work in the literature mainly relies on the large amount of bilingual parallel corpora to bridge the language gap. In many real applications, however, it is often the case that we have some partial parallel data but it is an expensive and time-consuming job to acquire large amount of parallel data on different languages. In this paper, we propose a novel subspace learning framework by leveraging the partial parallel data for cross-lingual sentiment classification. The proposed approach is achieved by jointly learning the document-aligned review data and un-aligned data from the source language and the target language via a non-negative matrix factorization framework. We conduct a set of experiments with cross-lingual sentiment classification tasks on multilingual Amazon product reviews. Our experimental results demonstrate the efficacy of the proposed cross-lingual approach.

classification, sentiment classification, target language, (13 more...)

AAAI Conferences

Twenty-Fourth International Joint Conference on Artificial Intelligence

Country:

North America > United States > California > Los Angeles County > Los Angeles (0.28)
Asia > China > Beijing > Beijing (0.04)
Asia > India (0.04)
Asia > China > Hubei Province > Wuhan (0.04)

Genre: Research Report > New Finding (0.88)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Classification (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Extraction (1.00)
Information Technology > Artificial Intelligence > Natural Language > Discourse & Dialogue (1.00)

Add feedback

Feature Ensemble Plus Sample Selection: Domain Adaptation for Sentiment Classification (Extended Abstract)

Xia, Rui (Nanjing University of Science and Technology) | Zong, Chengqing (Chinese Academy of Sciences) | Hu, Xuelei (Nanjing University of Science and Technology) | Cambria, Erik (Nanyang Technological University)

AAAI ConferencesJul-15-2015

The domain adaptation problem arises often in the field of sentiment classification. There are two distinct needs in domain adaptation, namely labeling adaptation and instance adaptation. Most of current research focuses on the former one, while neglects the latter one. In this work, we propose a joint approach, named feature ensemble plus sample selection (SS-FE), which takes both types of adaptation into account. A feature ensemble (FE) model is first proposed to learn a new labeling function in a feature re-weighting manner. Furthermore, a PCA-based sample selection (PCA-SS) method is proposed as an aid to FE for instance adaptation. Experimental results show that the proposed SS-FE approach could gain significant improvements, compared to individual FE and PCA-SS, due to its comprehensive consideration of both labeling adaptation and instance adaptation.

adaptation, domain adaptation, sentiment classification, (13 more...)

AAAI Conferences

Twenty-Fourth International Joint Conference on Artificial Intelligence

Country:

Asia > China > Jiangsu Province > Nanjing (0.04)
Europe > Switzerland (0.04)
Asia > Singapore (0.04)

Genre: Research Report > New Finding (0.34)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Information Extraction (0.88)
Information Technology > Artificial Intelligence > Natural Language > Discourse & Dialogue (0.88)
Information Technology > Artificial Intelligence > Natural Language > Text Classification (0.63)

Add feedback

Developing Corpora for Sentiment Analysis: The Case of Irony and Senti-TUT (Extended Abstract)

Bosco, Cristina (Dipartimento di Informatica, Università di Torino) | Patti, Viviana (Dipartimento di Informatica, Università di Torino) | Bolioli, Andrea (CELI srl)

AAAI ConferencesJul-15-2015

This paper focusses on the main issues related to the development of a corpus for opinion and sentiment analysis, with a special attention to irony, and presents as a case study Senti-TUT, a project for Italian aimed at investigating sentiment and irony in social media. We present the Senti-TUT corpus, a collection of texts from Twitter annotated with sentiment polarity. We describe the dataset, the annotation, the methodologies applied and our investigations on two important features of irony: polarity reversing and emotion expressions.

annotation, emotion, tweet, (15 more...)

AAAI Conferences

Twenty-Fourth International Joint Conference on Artificial Intelligence

Country:

Europe > Middle East > Republic of Türkiye > Istanbul Province > Istanbul (0.05)
Asia > Middle East > Republic of Türkiye > Istanbul Province > Istanbul (0.05)
Europe > Italy > Piedmont > Turin Province > Turin (0.04)
(5 more...)

Industry: Government (0.46)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Extraction (1.00)
Information Technology > Artificial Intelligence > Natural Language > Discourse & Dialogue (1.00)

Add feedback

Topic Modeling with Document Relative Similarities

Du, Jianguang (Beijing Institute of Technology) | Jiang, Jing (Singapore Management University) | Song, Dandan (Beijing Institute of Technology) | Liao, Lejian (Beijing Institute of Technology)

AAAI ConferencesJul-15-2015

Topic modeling has been widely used in text mining. Previous topic models such as Latent Dirichlet Allocation (LDA) are successful in learning hidden topics but they do not take into account metadata of documents. To tackle this problem, many augmented topic models have been proposed to jointly model text and metadata. But most existing models handle only categorical and numerical types of metadata. We identify another type of metadata that can be more natural to obtain in some scenarios. These are relative similarities among documents. In this paper, we propose a general model that links LDA with constraints derived from document relative similarities. Specifically, in our model, the constraints act as a regularizer of the log likelihood of LDA. We fit the proposed model using Gibbs-EM. Experiments with two real world datasets show that our model is able to learn meaningful topics. The results also show that our model outperforms the baselines in terms of topic coherence and a document classification task.

constraint, metadata, relative similarity, (13 more...)

AAAI Conferences

Twenty-Fourth International Joint Conference on Artificial Intelligence

Country:

Asia > Singapore (0.04)
Asia > China > Beijing > Beijing (0.04)
Asia > Middle East > Jordan (0.04)
(3 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Discourse & Dialogue (0.91)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.67)

Add feedback

Unsupervised Sentiment Analysis for Social Media Images

Wang, Yilin (Arizona State University) | Wang, Suhang (Arizona State University) | Tang, Jiliang (Arizona State University) | Liu, Huan (Arizona State University) | Li, Baoxin (Arizona State University)

AAAI ConferencesJul-15-2015

Current methods of sentiment analysis for social media images include low-level visual feature based approaches [Jia et Recently text-based sentiment prediction has been al., 2012; Yang et al., 2014], mid-level visual feature based extensively studied, while image-centric sentiment approaches [Borth et al., 2013; Yuan et al., 2013] and deep analysis receives much less attention. In this paper, learning based approaches [You et al., 2015]. The vast majority we study the problem of understanding human of existing methods are supervised, relying on labeled images sentiments from large-scale social media images, to train sentiment classifiers. Unfortunately, sentiment considering both visual content and contextual information, labels are in general unavailable for social media images, and such as comments on the images, captions, it is too labor-and time-intensive to obtain labeled sets large etc. The challenge of this problem lies in enough for robust training. In order to utilize the vast amount the "semantic gap" between low-level visual features of unlabeled social media images, an unsupervised approach and higher-level image sentiments. Moreover, would be much more desirable.

information, sentiment analysis, social media image, (11 more...)

AAAI Conferences

Twenty-Fourth International Joint Conference on Artificial Intelligence

Country: North America > United States > Arizona > Maricopa County > Tempe (0.05)

Industry: Information Technology > Services (0.33)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Extraction (1.00)
Information Technology > Artificial Intelligence > Natural Language > Discourse & Dialogue (1.00)

Add feedback

Personalized Sentiment Classification Based on Latent Individuality of Microblog Users

AAAI ConferencesJul-15-2015

Sentiment expression in microblog posts often reflects user's specific individuality due to different language habit, personal character, opinion bias and so on. Existing sentiment classification algorithms largely ignore such latent personal distinctions among different microblog users. Meanwhile, sentiment data of microblogs are sparse for individual users, making it infeasible to learn effective personalized classifier. In this paper, we propose a novel, extensible personalized sentiment classification method based on a variant of latent factor model to capture personal sentiment variations by mapping users and posts into a low-dimensional factor space. We alleviate the sparsity of personal texts by decomposing the posts into words which are further represented by the weighted sentiment and topic units based on a set of syntactic units of words obtained from dependency parsing results. To strengthen the representation of users, we leverage users following relation to consolidate the individuality of a user fused from other users with similar interests. Results on real-world microblog datasets confirm that our method outperforms state-of-the-art baseline algorithms with large margins.

individuality, proceedings, relation, (16 more...)

AAAI Conferences

Twenty-Fourth International Joint Conference on Artificial Intelligence

Country:

Asia > China > Hong Kong (0.04)
North America > United States > California > Santa Clara County > Palo Alto (0.04)
Asia > Middle East > Qatar > Ad-Dawhah > Doha (0.04)
(3 more...)

Industry: Information Technology (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Information Extraction (1.00)
Information Technology > Artificial Intelligence > Natural Language > Discourse & Dialogue (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.68)

Add feedback

Linking Heterogeneous Input Features with Pivots for Domain Adaptation

Zhou, Guangyou (Central China Normal University) | He, Tingting (Central China Normal University) | Wu, Wensheng (University of Southern California) | Hu, Xiaohua Tony (Central China Normal University)

AAAI ConferencesJul-15-2015

Sentiment classification aims to automatically predict sentiment polarity (e.g., positive or negative) of user generated sentiment data (e.g., reviews, blogs). In real applications, these user generated sentiment data can span so many different domains that it is difficult to manually label training data for all of them. Hence, this paper studies the problem of domain adaptation for sentiment classification where a systemtrained using labeled reviews from a source domain is deployed to classify sentimentsof reviews in a different target domain. In this paper, we propose to link heterogeneous input features with pivots via joint non-negative matrix factorization. This is achieved by learning the domain-specific information from different domains into unified topics, with the help of pivots across all domains. We conduct experiments on a benchmark composed of reviews of 4 types of Amazon products. Experimental results show that our proposed approach significantly outperforms the baseline method, and achieves an accuracy which is competitive with the state-of-the-art methods for sentiment classification adaptation.

adaptation, classification, sentiment classification, (14 more...)

AAAI Conferences

Twenty-Fourth International Joint Conference on Artificial Intelligence

Country:

North America > United States > California > Los Angeles County > Los Angeles (0.28)
Europe > Ireland > Leinster > County Dublin > Dublin (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
(2 more...)

Genre: Research Report > New Finding (0.34)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Information Extraction (1.00)
Information Technology > Artificial Intelligence > Natural Language > Discourse & Dialogue (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (0.46)

Add feedback