word cloud


Beyond the lyrics: the intersection of music and data visualization

#artificialintelligence

One of the things I love most about data science is stumbling upon some of the coolest and most powerful tools which instantly enable me to do cool and powerful things. I recently created a Spotify developer account, learned how to scrape lyrics, learned about sentiment analysis, word clouds, and other NLP concepts, and taught myself how to use an awesome interactive data visualization software called Tableau. And, most importantly of all, I have a long list of further projects I want to tackle next. I would be remiss not to give credit to the two Medium articles that helped me stumble in a productive, ever forward-moving direction. The first one is called "Extracting Spotify data on your favourite artist via Python" by Rare Loot.


AI Augmentation: The Real Future of Artificial Intelligence

#artificialintelligence

While artificial intelligence continues to drive completely autonomous technologies, its real value ... [ ] comes in enhancing the capabilities of the people that use it. I love Grammarly, the writing correction software from Grammarly, Inc. As a writer, it has proved invaluable to me time and time again, popping up quietly to say that I forgot a comma, got a bit too verbose on a sentence, or have used too many adverbs. I even sprung for the professional version. Besides endorsing it, I bring Grammarly up for another reason.


The Ultimate Beginner's Guide to Data Scraping, Cleaning, and Visualization

#artificialintelligence

If you have a model that has acceptable results but isn't amazing, take a look at your data! Taking the time to clean and preprocess your data the right way can make your model a star. In order to look at scraping and preprocessing in more detail, let's look at some of the work that went into "You Are What You Tweet: Detecting Depression in Social Media via Twitter Usage." That way, we can really examine the process of scraping Tweets and then cleaning and preprocessing them. We'll also do a little exploratory visualization, which is an awesome way to get a better sense of what your data looks like!


Summarizing Economic Bulletin Documents with TF-IDF

#artificialintelligence

A key strength of NLP (natural language processing) is being able to process large amounts of texts and then summarise them to extract meaningful insights. In this example, a selection of economic bulletins in PDF format from 2018 to 2019 are analysed in order to gauge economic sentiment. The bulletins in question are sourced from the European Central Bank website. As a disclaimer, the below examples are used solely to illustrate the use of natural language processing techniques for educational purposes. This is not intended as a formal economic summary in any business context.


Correlating Twitter Language with Community-Level Health Outcomes

arXiv.org Machine Learning

We study how language on social media is linked to diseases such as atherosclerotic heart disease (AHD), diabetes and various types of cancer. Our proposed model leverages state-of-the-art sentence embeddings, followed by a regression model and clustering, without the need of additional labelled data. It allows to predict community-level medical outcomes from language, and thereby potentially translate these to the individual level. The method is applicable to a wide range of target variables and allows us to discover known and potentially novel correlations of medical outcomes with life-style aspects and other socioeconomic risk factors.


Summarizing Economic Bulletin Documents with tf-idf

#artificialintelligence

In this example, a selection of economic bulletins in PDF format from 2018 to 2019 are analysed in order to gauge economic sentiment. The bulletins in question are sourced from the European Central Bank website. As a disclaimer, the below examples are used solely to illustrate the use of natural language processing techniques for educational purposes. This is not intended as a formal economic summary in any business context. Firstly, pdf2txt is used to convert the pdf files into text format using a Linux shell.


Lemotif: Abstract Visual Depictions of your Emotional States in Life

arXiv.org Artificial Intelligence

We present Lemotif. Lemotif generates a motif for your emotional life. You tell Lemotif a little bit about your day -- what were salient events or aspects and how they made you feel. Lemotif will generate a lemotif -- a creative abstract visual depiction of your emotions and their sources. Over time, Lemotif can create visual motifs to capture a summary of your emotional states over arbitrary periods of time -- making patterns in your emotions and their sources apparent, presenting opportunities to take actions, and measure their effectiveness. The underlying principles in Lemotif are that the lemotif should (1) separate out the sources of the emotions, (2) depict these sources visually, (3) depict the emotions visually, and (4) have a creative aspect to them. We verify via human studies that each of these factors contributes to the proposed lemotifs being favored over corresponding baselines.


Understanding hidden memories of recurrent neural networks

#artificialintelligence

Understanding hidden memories of recurrent neural networks Ming et al., VAST'17 Last week we looked at CORALS, winner of round 9 of the Yelp dataset challenge. Today's paper choice was a winner in round 10. We're used to visualisations of CNNs, which give interpretations of what is being learned in the hidden layers. But the inner workings of Recurrent Neural Networks (RNNs) have remained something of a mystery. RNNvis is a tool for visualising and exploring RNN models.


LDA for Text Summarization and Topic Detection - DZone AI

#artificialintelligence

Machine learning clustering techniques are not the only way to extract topics from a text data set. Text mining literature has proposed a number of statistical models, known as probabilistic topic models, to detect topics from an unlabeled set of documents. One of the most popular models is the latent Dirichlet allocation (LDA) algorithm developed by Blei, Ng, and Jordan [i]. LDA is a generative unsupervised probabilistic algorithm that isolates the top K topics in a data set as described by the most relevant N keywords. In other words, the documents in the data set are represented as random mixtures of latent topics, where each topic is characterized by a Dirichlet distribution over a fixed vocabulary.


YouTube AV 50K: an Annotated Corpus for Comments in Autonomous Vehicles

arXiv.org Artificial Intelligence

With one billion monthly viewers, and millions of users discussing and sharing opinions, comments below YouTube videos are rich sources of data for opinion mining and sentiment analysis. We introduce the YouTube AV 50K dataset, a freely-available collections of more than 50,000 YouTube comments and metadata below autonomous vehicle (AV)-related videos. We describe its creation process, its content and data format, and discuss its possible usages. Especially, we do a case study of the first self-driving car fatality to evaluate the dataset, and show how we can use this dataset to better understand public attitudes toward self-driving cars and public reactions to the accident. Future developments of the dataset are also discussed.