Discourse & Dialogue
Improving Multimodal Accuracy Through Modality Pre-training and Attention
Training a multimodal network is challenging and it requires complex architectures to achieve reasonable performance. We show that one reason for this phenomena is the difference between the convergence rate of various modalities. We address this by pre-training modality-specific sub-networks in multimodal architectures independently before end-to-end training of the entire network. Furthermore, we show that the addition of an attention mechanism between sub-networks after pre-training helps identify the most important modality during ambiguous scenarios boosting the performance. We demonstrate that by performing these two tricks a simple network can achieve similar performance to a complicated architecture that is significantly more expensive to train on multiple tasks including sentiment analysis, emotion recognition, and speaker trait recognition.
What we learn from AI's biases
In "How to Make a Racist AI Without Really Trying," Robyn Speer shows how to build a simple sentiment analysis system, using standard, well-known sources for word embeddings (GloVe and word2vec), and a widely used sentiment lexicon. Her program assigns "negative" sentiment to names and phrases associated with minorities, and "positive" sentiment to names and phrases associated with Europeans. Even a sentence like "Let's go get Mexican food" gets a much lower sentiment score than "Let's go get Italian food." That result isn't surprising, nor are Speer's conclusions: if you take a simplistic approach to sentiment analysis, you shouldn't be surprised when you get a program that embodies racist, discriminatory values. It's possible to minimize algorithmic racism (though possibly not eliminate it entirely), and Speer discusses several strategies for doing so.
Author's Sentiment Prediction
Bastan, Mohaddeseh, Koupaee, Mahnaz, Son, Youngseo, Sicoli, Richard, Balasubramanian, Niranjan
We introduce PerSenT, a dataset of crowd-sourced annotations of the sentiment expressed by the authors towards the main entities in news articles. The dataset also includes paragraph-level sentiment annotations to provide more fine-grained supervision for the task. Our benchmarks of multiple strong baselines show that this is a difficult classification task. The results also suggest that simply fine-tuning document-level representations from BERT isn't adequate for this task. Making paragraph-level decisions and aggregating them over the entire document is also ineffective. We present empirical and qualitative analyses that illustrate the specific challenges posed by this dataset. We release this dataset with 5.3k documents and 38k paragraphs covering 3.2k unique entities as a challenge in entity sentiment analysis.
How to Build a Twitter Sentiment Analysis System
In the field of social media data analytics, one popular area of research is the sentiment analysis of twitter data. Twitter is one of the most popular social media platforms in the world, with 330 million monthly active users and 500 million tweets sent each day. By carefully analyzing the sentiment of these tweets--whether they are positive, negative, or neutral, for example--we can learn a lot about how people feel about certain topics. Understanding the sentiment of tweets is important for a variety of reasons: business marketing, politics, public behavior analysis, and information gathering are just a few examples. Sentiment analysis of twitter data can help marketers understand the customer response to product launches and marketing campaigns, and it can also help political parties understand the public response to policy changes or announcements.
Spoken Language Interaction with Robots: Research Issues and Recommendations, Report from the NSF Future Directions Workshop
Marge, Matthew, Espy-Wilson, Carol, Ward, Nigel
With robotics rapidly advancing, more effective human-robot interaction is increasingly needed to realize the full potential of robots for society. While spoken language must be part of the solution, our ability to provide spoken language interaction capabilities is still very limited. The National Science Foundation accordingly convened a workshop, bringing together speech, language, and robotics researchers to discuss what needs to be done. The result is this report, in which we identify key scientific and engineering advances needed. Our recommendations broadly relate to eight general themes. First, meeting human needs requires addressing new challenges in speech technology and user experience design. Second, this requires better models of the social and interactive aspects of language use. Third, for robustness, robots need higher-bandwidth communication with users and better handling of uncertainty, including simultaneous consideration of multiple hypotheses and goals. Fourth, more powerful adaptation methods are needed, to enable robots to communicate in new environments, for new tasks, and with diverse user populations, without extensive re-engineering or the collection of massive training data. Fifth, since robots are embodied, speech should function together with other communication modalities, such as gaze, gesture, posture, and motion. Sixth, since robots operate in complex environments, speech components need access to rich yet efficient representations of what the robot knows about objects, locations, noise sources, the user, and other humans. Seventh, since robots operate in real time, their speech and language processing components must also. Eighth, in addition to more research, we need more work on infrastructure and resources, including shareable software modules and internal interfaces, inexpensive hardware, baseline systems, and diverse corpora.
F# and ML.NET Sentiment Analysis
A new version (v0.9.0) has recently been released, so we use this as an opportunity to play with some new functionality. The goal of today's post will be to perform sentiment analysis on movie reviews from IMDB. Note: ML.NET is still evolving, this post was written using Microsoft.ML v0.9.0. If you don't have it installed, head out to the .NET Core Downloads page. Tangential, but you can also get here by going to dot.net, then navigating to Downloads and .NET Core.
Knowing What You Know: Calibrating Dialogue Belief State Distributions via Ensembles
van Niekerk, Carel, Heck, Michael, Geishauser, Christian, Lin, Hsien-Chin, Lubis, Nurul, Moresi, Marco, Gaลกiฤ, Milica
The ability to accurately track what happens during a conversation is essential for the performance of a dialogue system. Current state-of-the-art multi-domain dialogue state trackers achieve just over 55% accuracy on the current go-to benchmark, which means that in almost every second dialogue turn they place full confidence in an incorrect dialogue state. Belief trackers, on the other hand, maintain a distribution over possible dialogue states. However, they lack in performance compared to dialogue state trackers, and do not produce well calibrated distributions. In this work we present state-of-the-art performance in calibration for multi-domain dialogue belief trackers using a calibrated ensemble of models. Our resulting dialogue belief tracker also outperforms previous dialogue belief tracking models in terms of accuracy.
Power BI: 5 Key AI Features You Should Start Using
Key Phrase Extraction: Using this function, you can feed big chunks of unstructured text to the system and get a list of key phrases. Unlike sentiment analysis, this function can deliver better results if you provide text in bigger blocks. Language Detection: This function analyzes the input text and provides the ISO identifier and language name. It can be leveraged to evaluate data columns in which the language of the text in not known. Currently about 120 languages are supported. Image Tagging: This function supports tagging of 2000 recognizable objects, living species, environmental settings, and actions.
Tweet Sentiment Extraction
Sentiment Analysis can be defined as the process of analyzing text data and categorizing them into Positive, Negative, or Neutral sentiments. Sentiment Analysis is used in many cases like Social Media Monitoring, Customer service, Brand Monitoring, political campaigns, etc. Analyzing customer feedback such as social media conversations, product reviews, and survey responses allows companies to understand the customer's emotions better which is becoming more essential to meet their needs. It is almost impossible to manually sort thousands of social media conversations, customer reviews, and surveys. So we have to use either ML/DL to build a model that analyzes the text data and performs the required operations. The problem I am trying to solve here is part of this Kaggle competition.
Improving Limited Labeled Dialogue State Tracking with Self-Supervision
Wu, Chien-Sheng, Hoi, Steven, Xiong, Caiming
Existing dialogue state tracking (DST) models require plenty of labeled data. However, collecting high-quality labels is costly, especially when the number of domains increases. In this paper, we address a practical DST problem that is rarely discussed, i.e., learning efficiently with limited labeled data. We present and investigate two self-supervised objectives: preserving latent consistency and modeling conversational behavior. We encourage a DST model to have consistent latent distributions given a perturbed input, making it more robust to an unseen scenario. We also add an auxiliary utterance generation task, modeling a potential correlation between conversational behavior and dialogue states. The experimental results show that our proposed self-supervised signals can improve joint goal accuracy by 8.95\% when only 1\% labeled data is used on the MultiWOZ dataset. We can achieve an additional 1.76\% improvement if some unlabeled data is jointly trained as semi-supervised learning. We analyze and visualize how our proposed self-supervised signals help the DST task and hope to stimulate future data-efficient DST research.