Path Normalized Optimization of Recurrent Neural Networks with ReLU Activations
–Neural Information Processing Systems
We investigate the parameter-space geometry of recurrent neural networks (RNNs), and develop an adaptation of path-SGD optimization method, attuned to this geometry, that can learn plain RNNs with ReLU activations. On several datasets that require capturing long-term dependency structure, we show that path-SGD can significantly improve trainability of ReLU RNNs compared to RNNs trained with SGD, even with various recently suggested initialization schemes.
Neural Information Processing Systems
Mar-12-2024, 12:46:03 GMT
- Country:
- Europe > Spain
- Catalonia > Barcelona Province > Barcelona (0.04)
- North America
- Canada > Ontario
- Toronto (0.14)
- United States
- Illinois > Cook County
- Chicago (0.04)
- Pennsylvania > Allegheny County
- Pittsburgh (0.04)
- Illinois > Cook County
- Canada > Ontario
- Europe > Spain
- Technology: