Goto

Collaborating Authors

 deferral rule


1f09e1ee5035a4c3fe38a5681cae5815-Supplemental-Conference.pdf

Neural Information Processing Systems

When Does Confidence-Based Cascade Deferral Suffice? A.3 Proof of Lemma 4.1 We start with Lemma A.1 which will help prove Lemma 4.1. We are ready to prove Lemma 4.1. By Lemma A.1, this is equivalent to showing that E ( 1[ η We provide an excess risk bound in Lemma A.2 and generalization bound in Lemma A.3. The excess risk for the learned deferral rule can be bounded as follows: Lemma A.2. Per Corollary 3.2, the excess risk for ˆ r can then be written as: R (ˆr; h We next bound the second term on the right-hand side.



When Does Confidence-Based Cascade Deferral Suffice?

Neural Information Processing Systems

Cascades are a classical strategy to enable inference cost to vary adaptively across samples, wherein a sequence of classifiers are invoked in turn. A deferral rule determines whether to invoke the next classifier in the sequence, or to terminate prediction. One simple deferral rule employs the confidence of the current classifier, e.g., based on the maximum predicted softmax probability. Despite being oblivious to the structure of the cascade --- e.g., not modelling the errors of downstream models --- such confidence-based deferral often works remarkably well in practice. In this paper, we seek to better understand the conditions under which confidence-based deferral may fail, and when alternate deferral strategies can perform better. We first present a theoretical characterisation of the optimal deferral rule, which precisely characterises settings under which confidence-based deferral may suffer. We then study post-hoc deferral mechanisms, and demonstrate they can significantly improve upon confidence-based deferral in settings where (i) downstream models are specialists that only work well on a subset of inputs, (ii) samples are subject to label noise, and (iii) there is distribution shift between the train and test set.



1f09e1ee5035a4c3fe38a5681cae5815-Supplemental-Conference.pdf

Neural Information Processing Systems

When Does Confidence-Based Cascade Deferral Suffice? A.3 Proof of Lemma 4.1 We start with Lemma A.1 which will help prove Lemma 4.1. We are ready to prove Lemma 4.1. By Lemma A.1, this is equivalent to showing that E ( 1[ η We provide an excess risk bound in Lemma A.2 and generalization bound in Lemma A.3. The excess risk for the learned deferral rule can be bounded as follows: Lemma A.2. Per Corollary 3.2, the excess risk for ˆ r can then be written as: R (ˆr; h We next bound the second term on the right-hand side.



When Does Confidence-Based Cascade Deferral Suffice?

Neural Information Processing Systems

Cascades are a classical strategy to enable inference cost to vary adaptively across samples, wherein a sequence of classifiers are invoked in turn. A deferral rule determines whether to invoke the next classifier in the sequence, or to terminate prediction. One simple deferral rule employs the confidence of the current classifier, e.g., based on the maximum predicted softmax probability. Despite being oblivious to the structure of the cascade --- e.g., not modelling the errors of downstream models --- such confidence-based deferral often works remarkably well in practice. In this paper, we seek to better understand the conditions under which confidence-based deferral may fail, and when alternate deferral strategies can perform better.


When Does Confidence-Based Cascade Deferral Suffice?

Neural Information Processing Systems

Cascades are a classical strategy to enable inference cost to vary adaptively across samples, wherein a sequence of classifiers are invoked in turn. A deferral rule determines whether to invoke the next classifier in the sequence, or to terminate prediction. One simple deferral rule employs the confidence of the current classifier, e.g., based on the maximum predicted softmax probability. Despite being oblivious to the structure of the cascade --- e.g., not modelling the errors of downstream models --- such confidence-based deferral often works remarkably well in practice. In this paper, we seek to better understand the conditions under which confidence-based deferral may fail, and when alternate deferral strategies can perform better.


A Unifying Post-Processing Framework for Multi-Objective Learn-to-Defer Problems

Charusaie, Mohammad-Amin, Samadi, Samira

arXiv.org Artificial Intelligence

Learn-to-Defer is a paradigm that enables learning algorithms to work not in isolation but as a team with human experts. In this paradigm, we permit the system to defer a subset of its tasks to the expert. Although there are currently systems that follow this paradigm and are designed to optimize the accuracy of the final human-AI team, the general methodology for developing such systems under a set of constraints (e.g., algorithmic fairness, expert intervention budget, defer of anomaly, etc.) remains largely unexplored. In this paper, using a $d$-dimensional generalization to the fundamental lemma of Neyman and Pearson (d-GNP), we obtain the Bayes optimal solution for learn-to-defer systems under various constraints. Furthermore, we design a generalizable algorithm to estimate that solution and apply this algorithm to the COMPAS and ACSIncome datasets. Our algorithm shows improvements in terms of constraint violation over a set of baselines.


Revisiting Cascaded Ensembles for Efficient Inference

Kolawole, Steven, Dennis, Don, Talwalkar, Ameet, Smith, Virginia

arXiv.org Artificial Intelligence

A common approach to make machine learning inference more efficient is to use example-specific adaptive schemes, which route or select models for each example at inference time. In this work we study a simple scheme for adaptive inference. We build a cascade of ensembles (CoE), beginning with resource-efficient models and growing to larger, more expressive models, where ensemble agreement serves as a data-dependent routing criterion. This scheme is easy to incorporate into existing inference pipelines, requires no additional training, and can be used to place models across multiple resource tiers--for instance, serving efficient models at the edge and invoking larger models in the cloud only when necessary. In cases where parallel inference is feasible, we show that CoE can improve accuracy relative to the single best model while reducing the average cost of inference by up to 7x, and provides Pareto-dominate solutions in accuracy and efficiency relative to existing adaptive inference baselines. These savings translate to an over 3x-reduction in total monetary cost when performing inference using a heterogeneous cluster of GPUs. Finally, for edge inference scenarios where portions of the cascade reside at the edge vs. in the cloud, CoE can provide a 14x reduction in communication cost and inference latency without sacrificing accuracy.