Reviews: Preventing Gradient Explosions in Gated Recurrent Units

Oct-8-2024, 12:50:23 GMT–Neural Information Processing Systems

Summary The authors propose a method for optimizing GRU networks which aims to prevent exploding gradients. They motivate the method by showing that a constraint on the spectral norm of the state-to-state matrix keeps the dynamics of the network stable near the fixed point 0. The method is evaluated on language modelling and a music prediction task and leads to stable training in comparison to weight clipping. Technical quality The motivation of the method is well developed and it is nice that the method is evaluated on two different real-world datasets. However, one important issue I have with the evaluation is that the learning rate is not controlled for in the experiments. Unfortunately, this makes it hard to draw strong conclusions from the results.

gated recurrent unit, grus, preventing gradient explosion, (5 more...)

Neural Information Processing Systems

Oct-8-2024, 12:50:23 GMT

Conferences Web Page

Add feedback

Technology:
- Information Technology > Artificial Intelligence > Machine Learning (0.65)