long short term memory
Originally developed in the late 1990's by Jürgen Schmidhuber, the LSTM block allows a part of the neural network to store a memory cell, and have gates to control whether that memory cell can be overwritten by an input, forgotten, or allowed to be fed to the output gates, kind of like an actual memory cell in a computer. The main difference is that in a computer's memory cell, everything is either one or off (1 or 0), whereas in the LSTM network, the cells will be from zero to one, controlled by a sigmoid function (Although in a memory cell, the actual voltage in the transistors can be closer to a sigmoid function than just 1 or 0). The network can also be trained via stochastic gradient descent, as the entire network can be differentiated and back propagation through time can be applied to train the weights. The advantage of this network is that memories can be stored indefinitely, while normal recurrent networks composed of only sigmoid functions can lose their states (or memory) quickly. Wonders can be done with LSTM especially in the area of speech recognition, and recently in image recognition.
Sep-4-2016, 09:50:27 GMT
- Technology: