AITopics | Discourse & Dialogue

Collaborating Authors

Discourse & Dialogue

Understanding Language in Conversations "The problems addressed in discourse research aim to answer two general kinds of questions: (1) what information is contained in extended sequences of utterances that goes beyond the meaning of the individual utterances themselves? (2) how does the context in which an utterance is used affect the meaning of the individual utterances, or parts of them?"
– Barbara Grosz. Overview of Chapter 6: Discourse and Dialogue, Survey of the State of the Art in Human Language Technology (1996).

News Overviews Instructional Materials AI-Alerts Classics

Active learning in annotating micro-blogs dealing with e-reputation

Cossu, Jean-Valère, Molina-Villegas, Alejandro, Tello-Signoret, Mariana

arXiv.org Artificial IntelligenceSep-25-2017

Elections unleash strong political views on Twitter, but what do people really think about politics? Opinion and trend mining on micro blogs dealing with politics has recently attracted researchers in several fields including Information Retrieval and Machine Learning (ML). Since the performance of ML and Natural Language Processing (NLP) approaches are limited by the amount and quality of data available, one promising alternative for some tasks is the automatic propagation of expert annotations. This paper intends to develop a so-called active learning process for automatically annotating French language tweets that deal with the image (i.e., representation, web reputation) of politicians. Our main focus is on the methodology followed to build an original annotated dataset expressing opinion from two French politicians over time. We therefore review state of the art NLP-based ML algorithms to automatically annotate tweets using a manual initiation step as bootstrap. This paper focuses on key issues about active learning while building a large annotated data set from noise. This will be introduced by human annotators, abundance of data and the label distribution across data and entities. In turn, we show that Twitter characteristics such as the author's name or hashtags can be considered as the bearing point to not only improve automatic systems for Opinion Mining (OM) and Topic Classification but also to reduce noise in human annotations. However, a later thorough analysis shows that reducing noise might induce the loss of crucial information.

annotation, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

doi: 10.18713/JIMIS-010917-3-2

1706.05349

Country:

Europe > France (0.68)
North America > United States > California > San Francisco County > San Francisco (0.14)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
(7 more...)

Genre: Research Report (1.00)

Industry:

Government > Voting & Elections (0.67)
Information Technology > Services (0.46)
Government > Regional Government > Europe Government > France Government (0.46)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Classification (0.88)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.68)
(2 more...)

Add feedback

Computational Content Analysis of Negative Tweets for Obesity, Diet, Diabetes, and Exercise

Shaw, George Jr., Karami, Amir

arXiv.org Machine LearningSep-22-2017

Social media based digital epidemiology has the potential to support faster response and deeper understanding of public health related threats. This study proposes a new framework to analyze unstructured health related textual data via Twitter users' post (tweets) to characterize the negative health sentiments and non-health related concerns in relations to the corpus of negative sentiments; regarding Diet Diabetes Exercise, and Obesity (DDEO). Through the collection of 6 million Tweets for one month, this study identified the prominent topics of users as it relates to the negative sentiments. Our proposed framework uses two text mining methods, sentiment analysis and topic modeling, to discover negative topics. The negative sentiments of Twitter users support the literature narratives and the many morbidity issues that are associated with DDEO and the linkage between obesity and diabetes. The framework offers a potential method to understand the publics' opinions and sentiments regarding DDEO. More importantly, this research provides new opportunities for computational social scientists, medical experts, and public health professionals to collectively address DDEO-related issues.

artificial intelligence, natural language, social media, (17 more...)

arXiv.org Machine Learning

1709.07915

Country: North America > United States (1.00)

Genre: Research Report > Experimental Study (0.88)

Industry: Health & Medicine > Therapeutic Area > Endocrinology > Diabetes (1.00)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Natural Language > Discourse & Dialogue (0.90)
Information Technology > Artificial Intelligence > Natural Language > Information Extraction (0.90)

Add feedback

Sentiment Analysis Just Got Smarter

@machinelearnbotSep-20-2017, 19:35:16 GMT

Sentiment analysis, sometimes called opinion mining, is one of the easiest and quickest ways to find out what consumers are thinking about a brand, product or event. It's a natural language processing technique often used in social listening scenarios, that aims to systematically identify opinions in a document and give it a score of positive, negative or neutral. There are few things as mind-numbingly tedious as manually tagging documents with the right sentiment because the technology doesn't get it. Sentiment analysis (ironically) has a bad reputation in the social listening industry, because truth be told, it needs a lot of manual work to deliver great results. Our data science guys (the brains behind our award winning image recognition technology) have been working on fixing this behind the scenes, and I'm excited to finally share their fantastic results.

artificial intelligence, natural language, sentiment analysis, (12 more...)

@machinelearnbot

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Extraction (1.00)
Information Technology > Artificial Intelligence > Natural Language > Discourse & Dialogue (1.00)

Add feedback

Text Compression for Sentiment Analysis via Evolutionary Algorithms

Dufourq, Emmanuel, Bassett, Bruce A.

arXiv.org Machine LearningSep-20-2017

Can textual data be compressed intelligently without losing accuracy in evaluating sentiment? In this study, we propose a novel evolutionary compression algorithm, PARSEC (PARts-of-Speech for sEntiment Compression), which makes use of Parts-of-Speech tags to compress text in a way that sacrifices minimal classification accuracy when used in conjunction with sentiment analysis algorithms. An analysis of PARSEC with eight commercial and non-commercial sentiment analysis algorithms on twelve English sentiment data sets reveals that accurate compression is possible with (0%, 1.3%, 3.3%) loss in sentiment classification accuracy for (20%, 50%, 75%) data compression with PARSEC using LingPipe, the most accurate of the sentiment algorithms. Other sentiment analysis algorithms are more severely affected by compression. We conclude that significant compression of text data is possible for sentiment analysis depending on the accuracy demands of the specific application and the specific sentiment analysis algorithm used.

compressor, machine learning, natural language, (15 more...)

arXiv.org Machine Learning

1709.0699

Country: North America > United States (0.46)

Genre: Research Report (0.70)

Industry: Media (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Information Extraction (1.00)
Information Technology > Artificial Intelligence > Natural Language > Discourse & Dialogue (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.54)

Add feedback

Google rolls out improvements to classification, sentiment analysis in Natural Language API

#artificialintelligenceSep-19-2017, 20:05:30 GMT

This is a Techmeme archive page. It shows how the site appeared at 3:50 PM ET, September 19, 2017. The most current version of the site as always is available at our home page. To view an earlier snapshot click here and then modify the date indicated.

artificial intelligence, natural language api, sentiment analysis, (2 more...)

#artificialintelligence

Technology:

Information Technology > Communications (0.90)
Information Technology > Artificial Intelligence > Natural Language > Information Extraction (0.40)
Information Technology > Artificial Intelligence > Natural Language > Discourse & Dialogue (0.40)

Add feedback

Stability of Topic Modeling via Matrix Factorization

Belford, Mark, Mac Namee, Brian, Greene, Derek

arXiv.org Machine LearningSep-9-2017

Topic models can provide us with an insight into the underlying latent structure of a large corpus of documents. A range of methods have been proposed in the literature, including probabilistic topic models and techniques based on matrix factorization. However, in both cases, standard implementations rely on stochastic elements in their initialization phase, which can potentially lead to different results being generated on the same corpus when using the same parameter values. This corresponds to the concept of "instability" which has previously been studied in the context of $k$-means clustering. In many applications of topic modeling, this problem of instability is not considered and topic models are treated as being definitive, even though the results may change considerably if the initialization process is altered. In this paper we demonstrate the inherent instability of popular topic modeling approaches, using a number of new measures to assess stability. To address this issue in the context of matrix factorization for topic modeling, we propose the use of ensemble learning strategies. Based on experiments performed on annotated text corpora, we show that a K-Fold ensemble strategy, combining both ensembles and structured initialization, can significantly reduce instability, while simultaneously yielding more accurate topic models.

artificial intelligence, machine learning, natural language, (14 more...)

arXiv.org Machine Learning

1702.07186

Genre:

Research Report > New Finding (0.46)
Research Report > Experimental Study (0.46)

Industry:

Leisure & Entertainment (0.68)
Health & Medicine (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Discourse & Dialogue (0.98)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.68)

Add feedback

Overcoming Language Variation in Sentiment Analysis with Social Attention

Yang, Yi, Eisenstein, Jacob

arXiv.org Artificial IntelligenceAug-26-2017

Variation in language is ubiquitous, particularly in newer forms of writing such as social media. Fortunately, variation is not random; it is often linked to social properties of the author. In this paper, we show how to exploit social networks to make sentiment analysis more robust to social language variation. The key idea is linguistic homophily: the tendency of socially linked individuals to use language in similar ways. We formalize this idea in a novel attention-based neural network architecture, in which attention is divided among several basis models, depending on the author's position in the social network. This has the effect of smoothing the classification function across the social network, and makes it possible to induce personalized classifiers even for authors for whom there is no labeled data or demographic metadata. This model significantly improves the accuracies of sentiment analysis on Twitter and on review data.

artificial intelligence, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

1511.06052

Country: North America > United States (0.28)

Genre:

Research Report > Experimental Study (0.68)
Research Report > New Finding (0.46)

Industry:

Information Technology > Services (0.95)
Leisure & Entertainment (0.68)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Extraction (1.00)
Information Technology > Artificial Intelligence > Natural Language > Discourse & Dialogue (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Who Wants to Know the Inner Workings of LDA?

#artificialintelligenceAug-25-2017, 14:16:14 GMT

In our recent series of blog posts on Topic Models, we've tried to explore this powerful new resource in the BigML Dashboard, in the API, using WhizzML, and we have also suggested some uses for it. But we've left a nuts and bolts description of how Latent Dirichlet Allocation (LDA) works until the end. Within this post, the last of a series of six posts, we'll try here to give you exactly that: A high-level overview of the internal mathematics that underlies Topic Models, and what that mathematics might imply for you, the modeler. While I'll explain a few things here, a more precise and technical explanation given by the inventor of the technique, David Blei, is available. Where there seems to be conflict between his explanation and mine, rest assured, his is correct!

artificial intelligence, natural language, topic distribution, (15 more...)

#artificialintelligence

Country: Asia > Middle East > Jordan (0.05)

Technology: Information Technology > Artificial Intelligence > Natural Language > Discourse & Dialogue (0.85)

Add feedback

Natural Language Processing: State of The Art, Current Trends and Challenges

Khurana, Diksha, Koli, Aditya, Khatter, Kiran, Singh, Sukhdev

arXiv.org Artificial IntelligenceAug-17-2017

Natural language processing (NLP) has recently gained much attention for representing and analysing human language computationally. It has spread its applications in various fields such as machine translation, email spam detection, information extraction, summarization, medical, and question answering etc. The paper distinguishes four phases by discussing different levels of NLP and components of Natural Language Generation (NLG) followed by presenting the history and evolution of NLP, state of the art presenting the various applications of NLP and current trends and challenges.

artificial intelligence, machine learning, text processing, (15 more...)

arXiv.org Artificial Intelligence

doi: 10.1007/s11042-022-13428-4

1708.05148

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)
Asia > India > Karnataka > Bengaluru (0.04)
North America > United States > New York (0.04)
(6 more...)

Genre: Research Report (0.82)

Industry:

Information Technology > Security & Privacy (1.00)
Health & Medicine > Pharmaceuticals & Biotechnology (0.93)
Banking & Finance (0.93)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Natural Language > Grammars & Parsing (1.00)
(5 more...)

Add feedback

Sparse Partially Collapsed MCMC for Parallel Inference in Topic Models

Magnusson, Måns, Jonsson, Leif, Villani, Mattias, Broman, David

arXiv.org Machine LearningAug-15-2017

Topic models, and more specifically the class of Latent Dirichlet Allocation (LDA), are widely used for probabilistic modeling of text. MCMC sampling from the posterior distribution is typically performed using a collapsed Gibbs sampler. We propose a parallel sparse partially collapsed Gibbs sampler and compare its speed and efficiency to state-of-the-art samplers for topic models on five well-known text corpora of differing sizes and properties. In particular, we propose and compare two different strategies for sampling the parameter block with latent topic indicators. The experiments show that the increase in statistical inefficiency from only partial collapsing is smaller than commonly assumed, and can be more than compensated by the speedup from parallelization and sparsity on larger corpora. We also prove that the partially collapsed samplers scale well with the size of the corpus. The proposed algorithm is fast, efficient, exact, and can be used in more modeling situations than the ordinary collapsed sampler.

machine learning, natural language, sampler, (18 more...)

arXiv.org Machine Learning

1506.03784

Country: North America (0.28)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Discourse & Dialogue (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)

Add feedback