Russell Stewart

@machinelearnbot 

Training Tensorflow's large language model on the Penn Tree Bank yields a test perplexity of 82. It depends on your personal taste. The high temperature sample displays greater linguistic variety, but the low temperature sample is more grammatically correct. Such is the world of temperature sampling - lowering the temperature allows you to focus on higher probability output sequences and smooth over deficiencies of the model. Temperature sampling works by increasing the probability of the most likely words before sampling. Suppose I ask you what day of the week it is, and you have a 70% chance of knowing the answer.