LLMs and the Madness of Crowds

Bradley, William F.

arXiv.org Artificial Intelligence 

When an LLM's inference is performed with a positive temperature, posing the same problem repeatedly will yield a distribution across the possible answers. If the LLM performs well, we would expect most of the probability mass to lie on the correct answer; if it performs poorly, we might expect the distribution to be more uniform across all the answers. However, this intuition does not always hold. To better understand the actual behavior of LLMs, we provide several detailed examples in Section 2. In Section 3, we perform a more comprehensive analysis at scale and use the results to suggest a taxonomy of LLMs.