In the deep learning journey so far on this website, I've introduced dense neural networks and convolutional neural networks (CNNs) which explain how to perform classification tasks on static images. We've seen good results, especially with CNN's. However, what happens if we want to analyze dynamic data? There are ways to do some of this using CNN's, but the most popular method of performing classification and other analysis on sequences of data is recurrent neural networks. This tutorial will be a very comprehensive introduction to recurrent neural networks and a subset of such networks – long-short term memory networks (or LSTM networks). I'll also show you how to implement such networks in TensorFlow – including the data preparation step. It's going to be a long one, so settle in and enjoy these pivotal networks in deep learning – at the end of this post, you'll have a very solid understanding of recurrent neural networks and LSTMs.
The vanishing gradients problem is one example of unstable behavior that you may encounter when training a deep neural network. It describes the situation where a deep multilayer feed-forward network or a recurrent neural network is unable to propagate useful gradient information from the output end of the model back to the layers near the input end of the model. The result is the general inability of models with many layers to learn on a given dataset or to prematurely converge to a poor solution. Many fixes and workarounds have been proposed and investigated, such as alternate weight initialization schemes, unsupervised pre-training, layer-wise training, and variations on gradient descent. Perhaps the most common change is the use of the rectified linear activation function that has become the new default, instead of the hyperbolic tangent activation function that was the default through the late 1990s and 2000s. In this tutorial, you will discover how to diagnose a vanishing gradient problem when training a neural network model and how to fix it using an alternate activation function and weight initialization scheme.
First of all it's important to underline why this problem is so important today, and therefore why it is very interesting to understand the role and the potential of Deep Learning in this sector. During the last years, Time Series Classification has become one of the most challenging problems in Data Science. This has happened because any classification problem that uses data keeping in consideration some notion of sorting, can be treated as a Time Series Classification problem. Time series are present in many real-world applications ranging from health care, human activity recognition, cyber-security, finance, marketing, automated disease detection, anomaly detection, etc. As the availability of temporal data has increased significantly in the last years, many areas are becoming strongly interested in applications based on time series, and then many new algorithms have been proposed. All these algorithms, apart from those based on deep learning, require some kind of feature engineering as a separate task before the classification is performed, and this can imply the loss of some information and the increase of the development time. On the contrary, deep learning models already incorporate this kind of feature engineering internally, optimizing it and eliminating the need to do it manually.
Artificial Intelligence, deep learning, machine learning -- whatever you're doing if you don't understand it -- learn it. Because otherwise you're going to be a dinosaur within 3 years. This statement from Mark Cuban might sound drastic – but its message is spot on! We are in middle of a revolution – a revolution caused by Big Huge data and a ton of computational power. For a minute, think how a person would feel in early 20th century if he / she did not understand electricity. You would have been used to doing things in a particular manner for ages and all of a sudden things around you started changing.