Information Extraction
Improving Multimodal Accuracy Through Modality Pre-training and Attention
Training a multimodal network is challenging and it requires complex architectures to achieve reasonable performance. We show that one reason for this phenomena is the difference between the convergence rate of various modalities. We address this by pre-training modality-specific sub-networks in multimodal architectures independently before end-to-end training of the entire network. Furthermore, we show that the addition of an attention mechanism between sub-networks after pre-training helps identify the most important modality during ambiguous scenarios boosting the performance. We demonstrate that by performing these two tricks a simple network can achieve similar performance to a complicated architecture that is significantly more expensive to train on multiple tasks including sentiment analysis, emotion recognition, and speaker trait recognition.
What we learn from AI's biases
In "How to Make a Racist AI Without Really Trying," Robyn Speer shows how to build a simple sentiment analysis system, using standard, well-known sources for word embeddings (GloVe and word2vec), and a widely used sentiment lexicon. Her program assigns "negative" sentiment to names and phrases associated with minorities, and "positive" sentiment to names and phrases associated with Europeans. Even a sentence like "Let's go get Mexican food" gets a much lower sentiment score than "Let's go get Italian food." That result isn't surprising, nor are Speer's conclusions: if you take a simplistic approach to sentiment analysis, you shouldn't be surprised when you get a program that embodies racist, discriminatory values. It's possible to minimize algorithmic racism (though possibly not eliminate it entirely), and Speer discusses several strategies for doing so.
Towards A Sentiment Analyzer for Low-Resource Languages
Indriani, Dian, Nasution, Arbi Haza, Monika, Winda, Nasution, Salhazan
Twitter is one of the top influenced social media which has a million number of active users. It is commonly used for microblogging that allows users to share messages, ideas, thoughts and many more. Thus, millions interaction such as short messages or tweets are flowing around among the twitter users discussing various topics that has been happening world-wide. This research aims to analyse a sentiment of the users towards a particular trending topic that has been actively and massively discussed at that time. We chose a hashtag \textit{\#kpujangancurang} that was the trending topic during the Indonesia presidential election in 2019. We use the hashtag to obtain a set of data from Twitter to analyse and investigate further the positive or the negative sentiment of the users from their tweets. This research utilizes rapid miner tool to generate the twitter data and comparing Naive Bayes, K-Nearest Neighbor, Decision Tree, and Multi-Layer Perceptron classification methods to classify the sentiment of the twitter data. There are overall 200 labeled data in this experiment. Overall, Naive Bayes and Multi-Layer Perceptron classification outperformed the other two methods on 11 experiments with different size of training-testing data split. The two classifiers are potential to be used in creating sentiment analyzer for low-resource languages with small corpus.
PhD Position Human-Centered Information Extraction from City Archival Data
The Knowledge and Intelligence Design section in the Department of Sustainable Design Engineering of the Faculty of Industrial Design Engineering (IDE) offers a PhD position for a duration of four years. The PhD candidate will be supervised by Prof. Alessandro Bozzon. The research work will be conducted in the context of a collaboration between TU Delft, the Amsterdam City Archive, and the CTO Office of Municipality of Amsterdam. The goal is to investigate human-centered artificial intelligence methods for the preservation of large collections of archival documents which are a valuable source of knowledge for cultural, social and urban research of a given city. To fully unlock the knowledge contained in the archives and facilitate the exploration and exploitation of the collections, there is a need for techniques to digitize the archives; to extract structured data, namely Named Entities (NEs) such as persons, locations, events, from unstructured archival documents; and to link the extracted entities to knowledge bases.
Author's Sentiment Prediction
Bastan, Mohaddeseh, Koupaee, Mahnaz, Son, Youngseo, Sicoli, Richard, Balasubramanian, Niranjan
We introduce PerSenT, a dataset of crowd-sourced annotations of the sentiment expressed by the authors towards the main entities in news articles. The dataset also includes paragraph-level sentiment annotations to provide more fine-grained supervision for the task. Our benchmarks of multiple strong baselines show that this is a difficult classification task. The results also suggest that simply fine-tuning document-level representations from BERT isn't adequate for this task. Making paragraph-level decisions and aggregating them over the entire document is also ineffective. We present empirical and qualitative analyses that illustrate the specific challenges posed by this dataset. We release this dataset with 5.3k documents and 38k paragraphs covering 3.2k unique entities as a challenge in entity sentiment analysis.
How to Build a Twitter Sentiment Analysis System
In the field of social media data analytics, one popular area of research is the sentiment analysis of twitter data. Twitter is one of the most popular social media platforms in the world, with 330 million monthly active users and 500 million tweets sent each day. By carefully analyzing the sentiment of these tweets--whether they are positive, negative, or neutral, for example--we can learn a lot about how people feel about certain topics. Understanding the sentiment of tweets is important for a variety of reasons: business marketing, politics, public behavior analysis, and information gathering are just a few examples. Sentiment analysis of twitter data can help marketers understand the customer response to product launches and marketing campaigns, and it can also help political parties understand the public response to policy changes or announcements.
F# and ML.NET Sentiment Analysis
A new version (v0.9.0) has recently been released, so we use this as an opportunity to play with some new functionality. The goal of today's post will be to perform sentiment analysis on movie reviews from IMDB. Note: ML.NET is still evolving, this post was written using Microsoft.ML v0.9.0. If you don't have it installed, head out to the .NET Core Downloads page. Tangential, but you can also get here by going to dot.net, then navigating to Downloads and .NET Core.
Power BI: 5 Key AI Features You Should Start Using
Key Phrase Extraction: Using this function, you can feed big chunks of unstructured text to the system and get a list of key phrases. Unlike sentiment analysis, this function can deliver better results if you provide text in bigger blocks. Language Detection: This function analyzes the input text and provides the ISO identifier and language name. It can be leveraged to evaluate data columns in which the language of the text in not known. Currently about 120 languages are supported. Image Tagging: This function supports tagging of 2000 recognizable objects, living species, environmental settings, and actions.
Tweet Sentiment Quantification: An Experimental Re-Evaluation
Moreo, Alejandro, Sebastiani, Fabrizio
Sentiment quantification is the task of estimating the relative frequency (or "prevalence") of sentiment-related classes (such as Positive, Neutral, Negative) in a sample of unlabelled texts; this is especially important when these texts are tweets, since most sentiment classification endeavours carried out on Twitter data actually have quantification (and not the classification of individual tweets) as their ultimate goal. It is well-known that solving quantification via "classify and count" (i.e., by classifying all unlabelled items via a standard classifier and counting the items that have been assigned to a given class) is suboptimal in terms of accuracy, and that more accurate quantification methods exist. In 2016, Gao and Sebastiani carried out a systematic comparison of quantification methods on the task of tweet sentiment quantification. In hindsight, we observe that the experimental protocol followed in that work is flawed, and that its results are thus unreliable. We now re-evaluate those quantification methods on the very same datasets, this time following a now consolidated and much more robust experimental protocol, that involves 5775 as many experiments as run in the original study. Our experimentation yields results dramatically different from those obtained by Gao and Sebastiani, and thus provide a different, much more solid understanding of the relative strengths and weaknesses of different sentiment quantification methods.
Tweet Sentiment Extraction
Sentiment Analysis can be defined as the process of analyzing text data and categorizing them into Positive, Negative, or Neutral sentiments. Sentiment Analysis is used in many cases like Social Media Monitoring, Customer service, Brand Monitoring, political campaigns, etc. Analyzing customer feedback such as social media conversations, product reviews, and survey responses allows companies to understand the customer's emotions better which is becoming more essential to meet their needs. It is almost impossible to manually sort thousands of social media conversations, customer reviews, and surveys. So we have to use either ML/DL to build a model that analyzes the text data and performs the required operations. The problem I am trying to solve here is part of this Kaggle competition.