If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."
However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …
One of the things I love most about data science is stumbling upon some of the coolest and most powerful tools which instantly enable me to do cool and powerful things. I recently created a Spotify developer account, learned how to scrape lyrics, learned about sentiment analysis, word clouds, and other NLP concepts, and taught myself how to use an awesome interactive data visualization software called Tableau. And, most importantly of all, I have a long list of further projects I want to tackle next. I would be remiss not to give credit to the two Medium articles that helped me stumble in a productive, ever forward-moving direction. The first one is called "Extracting Spotify data on your favourite artist via Python" by Rare Loot.
While artificial intelligence continues to drive completely autonomous technologies, its real value ... [ ] comes in enhancing the capabilities of the people that use it. I love Grammarly, the writing correction software from Grammarly, Inc. As a writer, it has proved invaluable to me time and time again, popping up quietly to say that I forgot a comma, got a bit too verbose on a sentence, or have used too many adverbs. I even sprung for the professional version. Besides endorsing it, I bring Grammarly up for another reason.
If you have a model that has acceptable results but isn't amazing, take a look at your data! Taking the time to clean and preprocess your data the right way can make your model a star. In order to look at scraping and preprocessing in more detail, let's look at some of the work that went into "You Are What You Tweet: Detecting Depression in Social Media via Twitter Usage." That way, we can really examine the process of scraping Tweets and then cleaning and preprocessing them. We'll also do a little exploratory visualization, which is an awesome way to get a better sense of what your data looks like!
A key strength of NLP (natural language processing) is being able to process large amounts of texts and then summarise them to extract meaningful insights. In this example, a selection of economic bulletins in PDF format from 2018 to 2019 are analysed in order to gauge economic sentiment. The bulletins in question are sourced from the European Central Bank website. As a disclaimer, the below examples are used solely to illustrate the use of natural language processing techniques for educational purposes. This is not intended as a formal economic summary in any business context.
We study how language on social media is linked to diseases such as atherosclerotic heart disease (AHD), diabetes and various types of cancer. Our proposed model leverages state-of-the-art sentence embeddings, followed by a regression model and clustering, without the need of additional labelled data. It allows to predict community-level medical outcomes from language, and thereby potentially translate these to the individual level. The method is applicable to a wide range of target variables and allows us to discover known and potentially novel correlations of medical outcomes with life-style aspects and other socioeconomic risk factors.
In this example, a selection of economic bulletins in PDF format from 2018 to 2019 are analysed in order to gauge economic sentiment. The bulletins in question are sourced from the European Central Bank website. As a disclaimer, the below examples are used solely to illustrate the use of natural language processing techniques for educational purposes. This is not intended as a formal economic summary in any business context. Firstly, pdf2txt is used to convert the pdf files into text format using a Linux shell.
We present Lemotif. Lemotif generates a motif for your emotional life. You tell Lemotif a little bit about your day -- what were salient events or aspects and how they made you feel. Lemotif will generate a lemotif -- a creative abstract visual depiction of your emotions and their sources. Over time, Lemotif can create visual motifs to capture a summary of your emotional states over arbitrary periods of time -- making patterns in your emotions and their sources apparent, presenting opportunities to take actions, and measure their effectiveness. The underlying principles in Lemotif are that the lemotif should (1) separate out the sources of the emotions, (2) depict these sources visually, (3) depict the emotions visually, and (4) have a creative aspect to them. We verify via human studies that each of these factors contributes to the proposed lemotifs being favored over corresponding baselines.
Understanding hidden memories of recurrent neural networks Ming et al., VAST'17 Last week we looked at CORALS, winner of round 9 of the Yelp dataset challenge. Today's paper choice was a winner in round 10. We're used to visualisations of CNNs, which give interpretations of what is being learned in the hidden layers. But the inner workings of Recurrent Neural Networks (RNNs) have remained something of a mystery. RNNvis is a tool for visualising and exploring RNN models.
Machine learning clustering techniques are not the only way to extract topics from a text data set. Text mining literature has proposed a number of statistical models, known as probabilistic topic models, to detect topics from an unlabeled set of documents. One of the most popular models is the latent Dirichlet allocation (LDA) algorithm developed by Blei, Ng, and Jordan [i]. LDA is a generative unsupervised probabilistic algorithm that isolates the top K topics in a data set as described by the most relevant N keywords. In other words, the documents in the data set are represented as random mixtures of latent topics, where each topic is characterized by a Dirichlet distribution over a fixed vocabulary.
With one billion monthly viewers, and millions of users discussing and sharing opinions, comments below YouTube videos are rich sources of data for opinion mining and sentiment analysis. We introduce the YouTube AV 50K dataset, a freely-available collections of more than 50,000 YouTube comments and metadata below autonomous vehicle (AV)-related videos. We describe its creation process, its content and data format, and discuss its possible usages. Especially, we do a case study of the first self-driving car fatality to evaluate the dataset, and show how we can use this dataset to better understand public attitudes toward self-driving cars and public reactions to the accident. Future developments of the dataset are also discussed.