Gatekeeper: Improving Model Cascades Through Confidence Tuning

Neural Information Processing Systems 

Large-scale machine learning models deliver strong performance across a wide range of tasks but come with significant computational and resource constraints. To mitigate these challenges, local smaller models are often deployed alongside larger models, relying on routing and deferral mechanisms to offload complex tasks.