Goto

Collaborating Authors

 axis


Beyond Augmented-Action Surrogates for Multi-Expert Learning-to-Defer

Montreuil, Yannis, Carlier, Axel, Ng, Lai Xing, Ooi, Wei Tsang

arXiv.org Machine Learning

Existing multi-expert learning-to-defer surrogates are statistically consistent, yet they can underfit, suppress useful experts, or degrade as the expert pool grows. We trace these failures to a shared architectural choice: casting classes and experts as actions inside one augmented prediction geometry. Consistency governs the population target; it says nothing about how the surrogate distributes gradient mass during training. We analyze five surrogates along both axes and show that each trades a fix on one for a failure on the other. We then introduce a decoupled surrogate that estimates the class posterior with a softmax and each expert utility with an independent sigmoid. It admits an $\mathcal{H}$-consistency bound whose constant is $J$-independent for fixed per-expert weight $β{=}λ/J$, and its gradients are free of the amplification, starvation, and coupling pathologies of the augmented family. Experiments on synthetic benchmarks, CIFAR-10, CIFAR-10H, and Covertype confirm that the decoupled surrogate is the only method that avoids amplification under redundancy, preserves rare specialists, and consistently improves over a standalone classifier across all settings.








1 Data Ingestion

Neural Information Processing Systems

For all other remaining architectures, the reported results are from private datasets. Neck Shaft Angle(NSA) cannot be estimated. Additionally, [? ] requires estimation of the diaphysis Figure 4: Repeatability of the femur morphometry extraction method as measured by error distributions for a) the landmarks/anatomical sizes and b) axis alignment identified by the adapted method. Do the main claims made in the abstract and introduction accurately reflect the paper's Did you specify all the training details (e.g., data splits, hyperparameters, how they were Data splits are available in the GitHub repository. Did you report error bars (e.g., with respect to the random seed after running ex-67 Did you include the total amount of compute and the type of resources used (e.g., Did you mention the license of the assets?