A bi-objective $\epsilon$-constrained framework for quality-cost optimization in language model ensembles