confidence-based deferral
1f09e1ee5035a4c3fe38a5681cae5815-Supplemental-Conference.pdf
When Does Confidence-Based Cascade Deferral Suffice? A.3 Proof of Lemma 4.1 We start with Lemma A.1 which will help prove Lemma 4.1. We are ready to prove Lemma 4.1. By Lemma A.1, this is equivalent to showing that E ( 1[ η We provide an excess risk bound in Lemma A.2 and generalization bound in Lemma A.3. The excess risk for the learned deferral rule can be bounded as follows: Lemma A.2. Per Corollary 3.2, the excess risk for ˆ r can then be written as: R (ˆr; h We next bound the second term on the right-hand side.
When Does Confidence-Based Cascade Deferral Suffice?
Cascades are a classical strategy to enable inference cost to vary adaptively across samples, wherein a sequence of classifiers are invoked in turn. A deferral rule determines whether to invoke the next classifier in the sequence, or to terminate prediction. One simple deferral rule employs the confidence of the current classifier, e.g., based on the maximum predicted softmax probability. Despite being oblivious to the structure of the cascade --- e.g., not modelling the errors of downstream models --- such confidence-based deferral often works remarkably well in practice. In this paper, we seek to better understand the conditions under which confidence-based deferral may fail, and when alternate deferral strategies can perform better. We first present a theoretical characterisation of the optimal deferral rule, which precisely characterises settings under which confidence-based deferral may suffer. We then study post-hoc deferral mechanisms, and demonstrate they can significantly improve upon confidence-based deferral in settings where (i) downstream models are specialists that only work well on a subset of inputs, (ii) samples are subject to label noise, and (iii) there is distribution shift between the train and test set.
- North America > United States > California > San Francisco County > San Francisco (0.14)
- North America > United States > California > Los Angeles County > Long Beach (0.14)
- North America > United States > New York > New York County > New York City (0.04)
- (6 more...)
- Information Technology > Artificial Intelligence > Natural Language (1.00)
- Information Technology > Artificial Intelligence > Vision (0.95)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)
1f09e1ee5035a4c3fe38a5681cae5815-Supplemental-Conference.pdf
When Does Confidence-Based Cascade Deferral Suffice? A.3 Proof of Lemma 4.1 We start with Lemma A.1 which will help prove Lemma 4.1. We are ready to prove Lemma 4.1. By Lemma A.1, this is equivalent to showing that E ( 1[ η We provide an excess risk bound in Lemma A.2 and generalization bound in Lemma A.3. The excess risk for the learned deferral rule can be bounded as follows: Lemma A.2. Per Corollary 3.2, the excess risk for ˆ r can then be written as: R (ˆr; h We next bound the second term on the right-hand side.
- North America > United States > New York (0.40)
- North America > United States > California > San Francisco County > San Francisco (0.14)
- North America > United States > California > Los Angeles County > Long Beach (0.14)
- (6 more...)
- Information Technology > Artificial Intelligence > Natural Language (1.00)
- Information Technology > Artificial Intelligence > Vision (0.95)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.88)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)
When Does Confidence-Based Cascade Deferral Suffice?
Cascades are a classical strategy to enable inference cost to vary adaptively across samples, wherein a sequence of classifiers are invoked in turn. A deferral rule determines whether to invoke the next classifier in the sequence, or to terminate prediction. One simple deferral rule employs the confidence of the current classifier, e.g., based on the maximum predicted softmax probability. Despite being oblivious to the structure of the cascade --- e.g., not modelling the errors of downstream models --- such confidence-based deferral often works remarkably well in practice. In this paper, we seek to better understand the conditions under which confidence-based deferral may fail, and when alternate deferral strategies can perform better.
When Does Confidence-Based Cascade Deferral Suffice?
Cascades are a classical strategy to enable inference cost to vary adaptively across samples, wherein a sequence of classifiers are invoked in turn. A deferral rule determines whether to invoke the next classifier in the sequence, or to terminate prediction. One simple deferral rule employs the confidence of the current classifier, e.g., based on the maximum predicted softmax probability. Despite being oblivious to the structure of the cascade --- e.g., not modelling the errors of downstream models --- such confidence-based deferral often works remarkably well in practice. In this paper, we seek to better understand the conditions under which confidence-based deferral may fail, and when alternate deferral strategies can perform better.
When Does Confidence-Based Cascade Deferral Suffice?
Jitkrittum, Wittawat, Gupta, Neha, Menon, Aditya Krishna, Narasimhan, Harikrishna, Rawat, Ankit Singh, Kumar, Sanjiv
Cascades are a classical strategy to enable inference cost to vary adaptively across samples, wherein a sequence of classifiers are invoked in turn. A deferral rule determines whether to invoke the next classifier in the sequence, or to terminate prediction. One simple deferral rule employs the confidence of the current classifier, e.g., based on the maximum predicted softmax probability. Despite being oblivious to the structure of the cascade -- e.g., not modelling the errors of downstream models -- such confidence-based deferral often works remarkably well in practice. In this paper, we seek to better understand the conditions under which confidence-based deferral may fail, and when alternate deferral strategies can perform better. We first present a theoretical characterisation of the optimal deferral rule, which precisely characterises settings under which confidence-based deferral may suffer. We then study post-hoc deferral mechanisms, and demonstrate they can significantly improve upon confidence-based deferral in settings where (i) downstream models are specialists that only work well on a subset of inputs, (ii) samples are subject to label noise, and (iii) there is distribution shift between the train and test set.
- North America > United States > California > San Francisco County > San Francisco (0.14)
- North America > United States > California > Los Angeles County > Long Beach (0.14)
- North America > United States > New York > New York County > New York City (0.04)
- (6 more...)