Goto

Collaborating Authors

 cascade


Efficient Contextual LLM Cascades through Budget-Constrained Policy Learning

Neural Information Processing Systems

Recent successes in natural language processing have led to the proliferation of large language models (LLMs) by multiple providers. Each LLM offering has different inference accuracy, monetary cost, and latency, and their accuracy further depends on the exact wording of the question ( i .








Net Hybrid UnrolledMulti Scale

Neural Information Processing Systems

The number of cascades in unrolled networks has a fundamental impact on their performance. The results are summarized inTable 3. Weobservethat ASR boosts the reconstruction quality of E2E-VarNet. Traditional Transformers for NLP receive a sequence of 1D token embeddings. The input to the Transformer encoder is thisN D representation, which we also refer to in the paperastokenrepresentation, aseachrowintherepresentation corresponds toatoken(inourcase animagepatch)intheoriginalinput.


1f09e1ee5035a4c3fe38a5681cae5815-Supplemental-Conference.pdf

Neural Information Processing Systems

When Does Confidence-Based Cascade Deferral Suffice? A.3 Proof of Lemma 4.1 We start with Lemma A.1 which will help prove Lemma 4.1. We are ready to prove Lemma 4.1. By Lemma A.1, this is equivalent to showing that E ( 1[ η We provide an excess risk bound in Lemma A.2 and generalization bound in Lemma A.3. The excess risk for the learned deferral rule can be bounded as follows: Lemma A.2. Per Corollary 3.2, the excess risk for ˆ r can then be written as: R (ˆr; h We next bound the second term on the right-hand side.