Estimating the Self-Consistency of LLMs
–arXiv.org Artificial Intelligence
Systems often repeat the same prompt to large language models (LLMs) and aggregate responses to improve reliability. Common approaches include self-consistency or simple majority voting (sample multiple outputs and choose the mode), prompt ensembling (rephrasing prompts to reduce wording sensitivity), and multi-agent debate (running multiple instances and aggregating their conclusions). Such consensus methods can stabilize outputs and improve accuracy, especially on multi-step reasoning tasks [1]. This short note analyzes an estimator of the self-consistency of LLMs and the tradeoffs it induces under a fixed compute budget B = mn, where m is the number of prompts sampled from the task distribution and n is the number of repeated LLM calls per prompt; the resulting analysis favors a rough split m,n B. It complements recent work on self-consistency prompting that aggregates multiple sampled reasoning paths to stabilize predictions [2, 3]. Consider a prompt x that requires a binary response.
arXiv.org Artificial Intelligence
Sep-25-2025
- Country:
- North America > United States > Wisconsin > Dane County > Madison (0.05)
- Genre:
- Research Report (0.40)
- Technology: