I was reading over Block's sequence generators, which seem to use RNN's with attention mechanisms to generate sequences. I'm not completely sure (I couldn't find any example of them being used), but they seem to be designed for training in a way where they will generate sequences, then calculate loss based on the generated sequence, rather than just predict the next character like Char-RNN. For Char-RNN's they seem to be trained in a discriminative fashion, but they can be used to sample the next character in a sequence, then feed in a new string with the predicted/sampled character appended to the string. This is more of a general discussion than a single question. Is there a fundamental difference between learning a probability distribution and sampling from it (like Char-RNN), or is Char-RNN also somehow implicitly learning to become a generative model like an RBM?