A bi-objective $\epsilon$-constrained framework for quality-cost optimization in language model ensembles

Singla, Aditi, Singh, Aditya, Kukreja, Kanishk

arXiv.org Artificial Intelligence 

We propose an ensembling framework that uses diverse open-sourced Large Language Models (LLMs) to achieve high response quality while maintaining cost efficiency. We formulate a bi-objective optimization problem to represent the quality-cost tradeoff and then introduce an additional budget constraint that reduces the problem to a straightforward 0/1 knapsack problem. We empirically demonstrate that our framework outperforms the existing ensembling approaches in response quality while significantly reducing costs. Large Language Models (LLMs) excel in traditional NLP problems (OpenAI (2023)), but their high inference costs hinder deployment in high-throughput applications (Anonymous (2023a)). Meanwhile, opensource models are less performant than their closed-source counterparts (Beeching et al. (2023)), but they typically offer lower inference costs (Kaplan et al. (2020)).