Goto

Collaborating Authors

 Discourse & Dialogue


An Oral Exam for Measuring a Dialog System’s Capabilities

AAAI Conferences

This paper suggests a model and methodology for measuring the breadth and flexibility of a dialog system's capabilities. The approach relies on having human evaluators administer a targeted oral exam to a system and provide their subjective views of that system's performance on each test problem. We present results from one instantiation of this test being performed on two publicly-accessible dialog systems and a human, and show that the suggested metrics do provide useful insights into the relative strengths and weaknesses of these systems. Results suggest that this approach can be performed with reasonable reliability and with reasonable amounts of effort. We hope that authors will augment their reporting with this approach to improve clarity and make more direct progress toward broadly-capable dialog systems.


Cross-Lingual Taxonomy Alignment with Bilingual Biterm Topic Model

AAAI Conferences

As more and more multilingual knowledge becomes available on the Web, knowledge sharing across languages has become an important task to benefit many applications. One of the most crucial kinds of knowledge on the Web is taxonomy, which is used to organize and classify the Web data. To facilitate knowledge sharing across languages, we need to deal with the problem of cross-lingual taxonomy alignment, which discovers the most relevant category in the target taxonomy of one language for each category in the source taxonomy of another language. Current approaches for aligning cross-lingual taxonomies strongly rely on domain-specific information and the features based on string similarities. In this paper, we present a new approach to deal with the problem of cross-lingual taxonomy alignment without using any domain-specific information. We first identify the candidate matched categories in the target taxonomy for each category in the source taxonomy using the cross-lingual string similarity. We then propose a novel bilingual topic model, called Bilingual Biterm Topic Model (BiBTM), to perform exact matching. BiBTM is trained by the textual contexts extracted from the Web. We conduct experiments on two kinds of real world datasets. The experimental results show that our approach significantly outperforms the designed state-of-the-art comparison methods.


Context-Sensitive Twitter Sentiment Classification Using Neural Network

AAAI Conferences

Sentiment classification on Twitter has attracted increasing research in recent years.Most existing work focuses on feature engineering according to the tweet content itself.In this paper, we propose a context-based neural network model for Twitter sentiment analysis, incorporating contextualized features from relevant Tweets into the model in the form of word embedding vectors.Experiments on both balanced and unbalanced datasets show that our proposed models outperform the current state-of-the-art.


Identifying Sentiment Words Using an Optimization Model with L1 Regularization

AAAI Conferences

Sentiment word identification is a fundamental work in numerous applications of sentiment analysis and opinion mining, such as review mining, opinion holder finding, and twitter classification. In this paper, we propose an optimization model with L1 regularization, called ISOMER, for identifying the sentiment words from the corpus. Our model can employ both seed words and documents with sentiment labels, different from most existing researches adopting seed words only. The L1 penalty in the objective function yields a sparse solution since most candidate words have no sentiment. The experiments on the real datasets show that ISOMER outperforms the classic approaches, and that the lexicon learned by ISOMER can be effectively adapted to document-level sentiment analysis.


Modeling Topic-Level Academic Influence in Scientific Literatures

AAAI Conferences

Scientific articles are not born equal. Some generate an entire discipline while others make relatively fewer contributions. When reviewing scientific literatures, it would be useful to identify those important articles and understand how they influence others. In this paper, we introduce J-Index, a quantitative metric modeling topic-level academic influence. J-Index is calculated based on the novelty of each article as well as its contributions to the articles where it is cited. We devise a generative model named Reference Topic Model (RefTM) which jointly utilizes the textual content and citation information in scientific literatures. We show how to learn RefTM to discover both the novelty of each paper and the strength of each citation. Experiments on a collection of more than 420,000 research papers demonstrate that RefTM outperforms the state-of-the-art approaches in terms of topic coherence as well as prediction performance, and validate J-Index's effectiveness of capturing topic-level academic influence in scientific literatures.


An Intelligent Dialogue Agent for the IoT Home

AAAI Conferences

In this paper, we propose an intelligent dialogue agent for the IoT home. The goal of the proposed system is to efficiently control IoT devices with natural spoken dialogue. This system is made up of the following components: Spoken Language Understanding for analyzing textual input and understanding user intention, Dialogue Management with a State Manager that consists of dialogue policies, Context Manager for understanding the environment, Action Planner responsible for generating a sequence of actions to achieve user intention, Things Manager for observing and controlling IoT devices, and Natural Language Generation that generates natural language from computer-based representation. This system is fully implemented in software and is evaluated in a real IoT home environment.


PyData Singapore

@machinelearnbot

Synopsis: There is more to Text Mining than TDM and TF-IDF. Come explore the world of Sentiment Analysis using Advanced Text Mining techniques with cutting edge tools like Stanford's CoreNLP and analysing it's output using Python. Speaker: Aditya Shankar is a Lecturer in the Intelligent Systems practice at the Institute of Systems Science in the National University of Singapore. He started his career consulting for Microsoft in Redmond, WA, Nike in Portland, OR and T-Mobile in Seattle, WA. He then moved on to work for companies in the Healthcare domain, mostly healthcare providers in Tennessee.


Partial Membership Latent Dirichlet Allocation

arXiv.org Machine Learning

Topic models (e.g., pLSA, LDA, SLDA) have been widely used for segmenting imagery. These models are confined to crisp segmentation. Yet, there are many images in which some regions cannot be assigned a crisp label (e.g., transition regions between a foggy sky and the ground or between sand and water at a beach). In these cases, a visual word is best represented with partial memberships across multiple topics. To address this, we present a partial membership latent Dirichlet allocation (PM-LDA) model and associated parameter estimation algorithms. Experimental results on two natural image datasets and one SONAR image dataset show that PM-LDA can produce both crisp and soft semantic image segmentations; a capability existing methods do not have.


Machine Learning for Sentiment Analysis • /r/MachineLearning

#artificialintelligence

I have been trying to use ML for sentiment analysis of sentences, I have been successful with Naive Bayes and SVM but I would like to implement Neural Networks for Sentiment Analysis but couldn't find a way to convert words as input for neural networks. I know that representing word as a numerical is not efficient. How is nlpnet implemented, I tried to understand that but that flew over my head.


Nonparametric Spherical Topic Modeling with Word Embeddings

arXiv.org Machine Learning

Traditional topic models do not account for semantic regularities in language. Recent distributional representations of words exhibit semantic consistency over directional metrics such as cosine similarity. However, neither categorical nor Gaussian observational distributions used in existing topic models are appropriate to leverage such correlations. In this paper, we propose to use the von Mises-Fisher distribution to model the density of words over a unit sphere. Such a representation is well-suited for directional data. We use a Hierarchical Dirichlet Process for our base topic model and propose an efficient inference algorithm based on Stochastic Variational Inference. This model enables us to naturally exploit the semantic structures of word embeddings while flexibly discovering the number of topics. Experiments demonstrate that our method outperforms competitive approaches in terms of topic coherence on two different text corpora while offering efficient inference.