word cloud


Summarizing Economic Bulletin Documents with TF-IDF

#artificialintelligence

A key strength of NLP (natural language processing) is being able to process large amounts of texts and then summarise them to extract meaningful insights. In this example, a selection of economic bulletins in PDF format from 2018 to 2019 are analysed in order to gauge economic sentiment. The bulletins in question are sourced from the European Central Bank website. As a disclaimer, the below examples are used solely to illustrate the use of natural language processing techniques for educational purposes. This is not intended as a formal economic summary in any business context.


Correlating Twitter Language with Community-Level Health Outcomes

arXiv.org Machine Learning

We study how language on social media is linked to diseases such as atherosclerotic heart disease (AHD), diabetes and various types of cancer. Our proposed model leverages state-of-the-art sentence embeddings, followed by a regression model and clustering, without the need of additional labelled data. It allows to predict community-level medical outcomes from language, and thereby potentially translate these to the individual level. The method is applicable to a wide range of target variables and allows us to discover known and potentially novel correlations of medical outcomes with life-style aspects and other socioeconomic risk factors.


Summarizing Economic Bulletin Documents with tf-idf

#artificialintelligence

In this example, a selection of economic bulletins in PDF format from 2018 to 2019 are analysed in order to gauge economic sentiment. The bulletins in question are sourced from the European Central Bank website. As a disclaimer, the below examples are used solely to illustrate the use of natural language processing techniques for educational purposes. This is not intended as a formal economic summary in any business context. Firstly, pdf2txt is used to convert the pdf files into text format using a Linux shell.


Lemotif: Abstract Visual Depictions of your Emotional States in Life

arXiv.org Artificial Intelligence

We present Lemotif. Lemotif generates a motif for your emotional life. You tell Lemotif a little bit about your day -- what were salient events or aspects and how they made you feel. Lemotif will generate a lemotif -- a creative abstract visual depiction of your emotions and their sources. Over time, Lemotif can create visual motifs to capture a summary of your emotional states over arbitrary periods of time -- making patterns in your emotions and their sources apparent, presenting opportunities to take actions, and measure their effectiveness. The underlying principles in Lemotif are that the lemotif should (1) separate out the sources of the emotions, (2) depict these sources visually, (3) depict the emotions visually, and (4) have a creative aspect to them. We verify via human studies that each of these factors contributes to the proposed lemotifs being favored over corresponding baselines.


Understanding hidden memories of recurrent neural networks

#artificialintelligence

Understanding hidden memories of recurrent neural networks Ming et al., VAST'17 Last week we looked at CORALS, winner of round 9 of the Yelp dataset challenge. Today's paper choice was a winner in round 10. We're used to visualisations of CNNs, which give interpretations of what is being learned in the hidden layers. But the inner workings of Recurrent Neural Networks (RNNs) have remained something of a mystery. RNNvis is a tool for visualising and exploring RNN models.


LDA for Text Summarization and Topic Detection - DZone AI

#artificialintelligence

Machine learning clustering techniques are not the only way to extract topics from a text data set. Text mining literature has proposed a number of statistical models, known as probabilistic topic models, to detect topics from an unlabeled set of documents. One of the most popular models is the latent Dirichlet allocation (LDA) algorithm developed by Blei, Ng, and Jordan [i]. LDA is a generative unsupervised probabilistic algorithm that isolates the top K topics in a data set as described by the most relevant N keywords. In other words, the documents in the data set are represented as random mixtures of latent topics, where each topic is characterized by a Dirichlet distribution over a fixed vocabulary.


YouTube AV 50K: an Annotated Corpus for Comments in Autonomous Vehicles

arXiv.org Artificial Intelligence

With one billion monthly viewers, and millions of users discussing and sharing opinions, comments below YouTube videos are rich sources of data for opinion mining and sentiment analysis. We introduce the YouTube AV 50K dataset, a freely-available collections of more than 50,000 YouTube comments and metadata below autonomous vehicle (AV)-related videos. We describe its creation process, its content and data format, and discuss its possible usages. Especially, we do a case study of the first self-driving car fatality to evaluate the dataset, and show how we can use this dataset to better understand public attitudes toward self-driving cars and public reactions to the accident. Future developments of the dataset are also discussed.


Statistical Analysis on E-Commerce Reviews, with Sentiment Classification using Bidirectional Recurrent Neural Network (RNN)

arXiv.org Machine Learning

Understanding customer sentiments is of paramount importance in marketing strategies today. Not only will it give companies an insight as to how customers perceive their products and/or services, but it will also give them an idea on how to improve their offers. This paper attempts to understand the correlation of different variables in customer reviews on a women clothing e-commerce, and to classify each review whether it recommends the reviewed product or not and whether it consists of positive, negative, or neutral sentiment. To achieve these goals, we employed univariate and multivariate analyses on dataset features except for review titles and review texts, and we implemented a bidirectional recurrent neural network (RNN) with long-short term memory unit (LSTM) for recommendation and sentiment classification. Results have shown that a recommendation is a strong indicator of a positive sentiment score, and vice-versa. On the other hand, ratings in product reviews are fuzzy indicators of sentiment scores. We also found out that the bidirectional LSTM was able to reach an F1-score of 0.88 for recommendation classification, and 0.93 for sentiment classification.


Text mining with R Udemy

@machinelearnbot

Have you always wanted to mine twitter data? Then this course is for you. This course presents example of text mining with R. Twitter text of @pycon and @udemy is used as the data to analyze. It starts by extracting text from Twitter. The extracted text is then transformed to a corpus and then a document-term matrix.


Qualitative Data Science: Using RQDA to analyse interviews

#artificialintelligence

Qualitative data science sounds like a contradiction in terms. Data scientists generally solve problems using numerical solutions. Even the analysis of text is reduced to a numerical problem using Markov chains, topic analysis, sentiment analysis and other mathematical tools. Scientists and professionals consider numerical methods the gold standard of analysis. There is, however, a price to pay when relying on numbers alone.