Learning to Route LLMs from Bandit Feedback: One Policy, Many Trade-offs

Open in new window