[R] [1706.04223] Adversarially Regularized Autoencoders for Generating Discrete Structures • r/MachineLearning
Why did they not use an (additive) attention mechanism for the encoder and decoder? It's heavily proven to improve gradient flow. Perhaps I'm missing something big here that would make this training procedure not work.
Jun-15-2017, 23:40:15 GMT
- Technology: