?utm_source=dlvr.it&utm_medium=twitter
Basically, each X(t n) consists of a full set of connections that are input at that particular timestep of the sequence. Also not shown are the fact that each gate and cell has it's own set of weights and biases for both the input and recurrent connections. Thus, an LSTM actually has four sets of input and recurrent weight and bias parameters. In practice this means that usually the input is represented as a tensor with three dimensions (batch, timestep, input).
Jun-17-2017, 22:15:08 GMT