The Costly Dilemma: Generalization, Evaluation and Cost-Optimal Deployment of Large Language Models