What is the best model? Application-driven Evaluation for Large Language Models