Eigenvalue Normalized Recurrent Neural Networks for Short Term Memory

Nov-18-2019–arXiv.org Machine Learning

The underlying dynamical system carries temporal information from one time step to another and captures potential dependencies among the terms of a sequence. Like other deep neural networks, the weights of an RNN are learned by gradient descent. For the input at a time step to affect the output at a later time step, the gradients must back-propagate through each step. Since a sequence can be quite long, RNNs are prone to suffer from vanishing or exploding gradients as described in (Bengio, Frasconi, and Simard 1993) and (Pas-canu, Mikolov, and Bengio 2013). One consequence of this well-known problem is the difficulty of the network to model input-output dependency over a large number of time steps. There have been many different architectures that are designed to mitigate this problem. The most popular RNN architectures such as LSTMs (Hochreiter and Schmidhu-ber 1997) and GRUs (Cho et al. 2014), incorporate a gating mechanism to explicitly retain or discard information.

eigenvalue, matrix, sequence, (16 more...)

arXiv.org Machine Learning

Nov-18-2019

arXiv.org PDF

Add feedback

Country:
- Oceania > Australia
  - New South Wales > Sydney (0.04)
- North America > United States
  - Kentucky > Fayette County
    - Lexington (0.14)
  - Hawaii > Honolulu County
    - Honolulu (0.04)
  - Georgia > Fulton County
    - Atlanta (0.04)
  - California > San Francisco County
    - San Francisco (0.14)
- Europe
  - Italy > Sardinia (0.04)
  - Sweden > Stockholm
    - Stockholm (0.04)

Genre:
- Research Report (0.50)

Technology:
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found