untapt engineering insight
How to Understand How LSTMs Work – untapt Engineering Insights
When modelling data that have an inherent sequential structure to them, such as the sequence of words in language or the sequence of milliseconds in financial market data, your first choice from the universe of Deep Learning algorithms is typically going to be a Recurrent Neural Network. RNNs enable information from previous time steps to influence the present one. The problem with vanilla RNNs, however, is that the influence quickly drops off. While the preceding step (e.g., the previous word or market-feed update) can have a great impact on the current time step, the information from, say, ten steps (ten words or ten market-feed updates) earlier is limited to having minimal or negligible impact on the current time step. The prevailing solution to this vanishing gradient issue is to add gates to individual RNN units, enabling the important information (e.g., a verb or market-movement-predicting signal) from previous steps to be retained, while less consequential information (like a stop word or a boring market update) from previous steps is forgotten.