C3PO: Optimized Large Language Model Cascades with Probabilistic Cost Constraints for Reasoning

Jun-12-2026, 22:00:27 GMT–Neural Information Processing Systems

Large language models (LLMs) have achieved impressive results on complex reasoning tasks, but their high inference cost remains a major barrier to real-world deployment. A promising solution is to use cascaded inference, where small, cheap models handle easy queries, and only the hardest examples are escalated to more powerful models. However, existing cascade methods typically rely on supervised training with labeled data, offer no theoretical generalization guarantees, and provide limited control over test-time computational cost.

artificial intelligence, large language model, natural language, (8 more...)

Neural Information Processing Systems

Jun-12-2026, 22:00:27 GMT

Conferences Web Page

Add feedback

Genre:
- Research Report (0.56)

Technology:
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.72)