Eigenvalue Normalized Recurrent Neural Networks for Short Term Memory
The underlying dynamical system carries temporal information from one time step to another and captures potential dependencies among the terms of a sequence. Like other deep neural networks, the weights of an RNN are learned by gradient descent. For the input at a time step to affect the output at a later time step, the gradients must back-propagate through each step. Since a sequence can be quite long, RNNs are prone to suffer from vanishing or exploding gradients as described in (Bengio, Frasconi, and Simard 1993) and (Pas-canu, Mikolov, and Bengio 2013). One consequence of this well-known problem is the difficulty of the network to model input-output dependency over a large number of time steps. There have been many different architectures that are designed to mitigate this problem. The most popular RNN architectures such as LSTMs (Hochreiter and Schmidhu-ber 1997) and GRUs (Cho et al. 2014), incorporate a gating mechanism to explicitly retain or discard information.
Nov-18-2019
- Country:
- Europe
- North America > United States
- California > San Francisco County
- San Francisco (0.14)
- Georgia > Fulton County
- Atlanta (0.04)
- Hawaii > Honolulu County
- Honolulu (0.04)
- Kentucky > Fayette County
- Lexington (0.14)
- California > San Francisco County
- Oceania > Australia
- New South Wales > Sydney (0.04)
- Genre:
- Research Report (0.50)
- Technology: