Maximum Likelihood Decoding with RNNs - the good, the bad, and the ugly - The Stanford Natural Language Processing Group

@machinelearnbot 

Training Tensorflow's large language model on the Penn Tree Bank yields a test perplexity of 82. It depends on your personal taste. The high temperature sample displays greater linguistic variety, but the low temperature sample is more grammatically correct. Such is the world of temperature sampling - lowering the temperature allows you to focus on higher probability output sequences and smooth over deficiencies of the model. Temperature sampling works by increasing the probability of the most likely words before sampling.