Benchmarking LLMs via Uncertainty Quantification