Beyond Augmented-Action Surrogates for Multi-Expert Learning-to-Defer
Montreuil, Yannis, Carlier, Axel, Ng, Lai Xing, Ooi, Wei Tsang
Existing multi-expert learning-to-defer surrogates are statistically consistent, yet they can underfit, suppress useful experts, or degrade as the expert pool grows. We trace these failures to a shared architectural choice: casting classes and experts as actions inside one augmented prediction geometry. Consistency governs the population target; it says nothing about how the surrogate distributes gradient mass during training. We analyze five surrogates along both axes and show that each trades a fix on one for a failure on the other. We then introduce a decoupled surrogate that estimates the class posterior with a softmax and each expert utility with an independent sigmoid. It admits an $\mathcal{H}$-consistency bound whose constant is $J$-independent for fixed per-expert weight $β{=}λ/J$, and its gradients are free of the amplification, starvation, and coupling pathologies of the augmented family. Experiments on synthetic benchmarks, CIFAR-10, CIFAR-10H, and Covertype confirm that the decoupled surrogate is the only method that avoids amplification under redundancy, preserves rare specialists, and consistently improves over a standalone classifier across all settings.
Apr-20-2026
- Country:
- Asia > Singapore (0.04)
- Europe > France
- Occitanie > Haute-Garonne > Toulouse (0.04)
- North America > United States (0.04)
- Genre:
- Research Report (0.50)
- Technology: