A User-Centric Benchmark for Evaluating Large Language Models