[D] Backpropagating to LSTM inputs!
Hi, I'm trying an architecture that is a sort of autoencoder, where the encoded representation is a string. In order to deal with differentiability issues, I'm not actually encoding it as a string, but as the softmax of the output of the encoder LSTM. Then, this tensor is fed into the decoder LSTM. However, I am noticing a huge difference (of the order of 10 3 or 10 4) between the grads calculated on the outputs of the decoder LSTM and the inputs during backpropagation. That is, it seems that the LSTM barely propagates back to the input sequence.
Feb-26-2021, 03:12:03 GMT
- Technology: