Path Normalized Optimization of Recurrent Neural Networks with ReLU Activations

Mar-12-2024, 12:46:03 GMT–Neural Information Processing Systems

We investigate the parameter-space geometry of recurrent neural networks (RNNs), and develop an adaptation of path-SGD optimization method, attuned to this geometry, that can learn plain RNNs with ReLU activations. On several datasets that require capturing long-term dependency structure, we show that path-SGD can significantly improve trainability of ReLU RNNs compared to RNNs trained with SGD, even with various recently suggested initialization schemes.

neural network, path-sgd, rnn, (15 more...)

Neural Information Processing Systems

Mar-12-2024, 12:46:03 GMT

Conferences PDF

Add feedback

Country:
- North America
  - United States
    - Pennsylvania > Allegheny County
      - Pittsburgh (0.04)
    - Illinois > Cook County
      - Chicago (0.04)
  - Canada > Ontario
    - Toronto (0.14)
- Europe > Spain
  - Catalonia > Barcelona Province > Barcelona (0.04)

Technology:
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)