Notes on Hierarchical Multiscale Recurrent Neural Networks

Sep-30-2016, 03:35:48 GMT–#artificialintelligence

Lots of prior work with hierarchy (hierarchical RNN / stacked RNN) and multi-scale (LSTM, clockwork RNN) but they all rely on pre-defined boundaries, pre-defined scales, or soft non-hierarchical boundaries. Avoids "soft" gating which leads to "curse of updating every timestep". Discrete (binary) decisions are difficult to optimize due to non-smooth gradients. Uses straight-through estimator (as an alternative to REINFORCE) to learn discrete variables. The simplest variant uses a step function on the forward pass and a hard sigmoid on backward pass for gradient estimation.

hierarchical multiscale recurrent neural network, machine learning, natural language, (8 more...)

#artificialintelligence

Sep-30-2016, 03:35:48 GMT

News Web Page

Add feedback

Technology:
- Information Technology > Artificial Intelligence
  - Natural Language (1.00)
  - Machine Learning > Neural Networks
    - Deep Learning (1.00)