In the deep learning journey so far on this website, I've introduced dense neural networks and convolutional neural networks (CNNs) which explain how to perform classification tasks on static images. We've seen good results, especially with CNN's. However, what happens if we want to analyze dynamic data? There are ways to do some of this using CNN's, but the most popular method of performing classification and other analysis on sequences of data is recurrent neural networks. This tutorial will be a very comprehensive introduction to recurrent neural networks and a subset of such networks – long-short term memory networks (or LSTM networks). I'll also show you how to implement such networks in TensorFlow – including the data preparation step. It's going to be a long one, so settle in and enjoy these pivotal networks in deep learning – at the end of this post, you'll have a very solid understanding of recurrent neural networks and LSTMs.
In previous posts, I introduced Keras for building convolutional neural networks and performing word embedding. The next natural step is to talk about implementing recurrent neural networks in Keras. In a previous tutorial of mine, I gave a very comprehensive introduction to recurrent neural networks and long short term memory (LSTM) networks, implemented in TensorFlow. In this Keras LSTM tutorial, we'll implement a sequence-to-sequence text prediction model by utilizing a large text data set called the PTB corpus. All the code in this tutorial can be found on this site's Github repository.
As you read this article, you understand each word based on your understanding of previous words. You don't throw everything away and start thinking from scratch again. We have already seen in Introduction to Artificial Neural Networks(ANN) how ANN can be used for regression and classification tasks, and in Introduction to Convolutional Neural Networks(CNN) how CNN can be used for image recognition, segmentation or object detection and computer-vision related tasks. But what if we have sequential data? Before we dig into details of Recurrent Neural networks, if you are a beginner I suggest you read below two articles to get a basic understanding of neural networks.
First of all it's important to underline why this problem is so important today, and therefore why it is very interesting to understand the role and the potential of Deep Learning in this sector. During the last years, Time Series Classification has become one of the most challenging problems in Data Science. This has happened because any classification problem that uses data keeping in consideration some notion of sorting, can be treated as a Time Series Classification problem. Time series are present in many real-world applications ranging from health care, human activity recognition, cyber-security, finance, marketing, automated disease detection, anomaly detection, etc. As the availability of temporal data has increased significantly in the last years, many areas are becoming strongly interested in applications based on time series, and then many new algorithms have been proposed. All these algorithms, apart from those based on deep learning, require some kind of feature engineering as a separate task before the classification is performed, and this can imply the loss of some information and the increase of the development time. On the contrary, deep learning models already incorporate this kind of feature engineering internally, optimizing it and eliminating the need to do it manually.
Traditionally recurrent neural networks and their variants have been used extensively for Natural Language Processing problems. In recent years, transformers have outperformed most RNN models. Before looking at transformers, let's revisit recurrent neural networks, how they work, and where they fall behind. There are different types of recurrent neural networks. When it comes to natural language processing RNNs, they work in an encoder-decoder architecture. Encoders will summarize all the information from the input sentence, and the decoder will use the encoder's output to create the right output.