Goto

Collaborating Authors

 surrogate


Analytical Extraction of Conditional Sobol' Indices via Basis Decomposition of Polynomial Chaos Expansions

Zhong, Shijie, Fu, Jiangfeng

arXiv.org Machine Learning

In uncertainty quantification, evaluating sensitivity measures under specific conditions (i.e., conditional Sobol' indices) is essential for systems with parameterized responses, such as spatial fields or varying operating conditions. Traditional approaches often rely on point-wise modeling, which is computationally expensive and may lack consistency across the parameter space. This paper demonstrates that for a pre-trained global Polynomial Chaos Expansion (PCE) model, the analytical conditional Sobol' indices are inherently embedded within its basis functions. By leveraging the tensor-product property of PCE bases, we reformulate the global expansion into a set of analytical coefficient fields that depend on the conditioning variables. Based on the preservation of orthogonality under conditional probability measures, we derive closed-form expressions for conditional variances and Sobol' indices. This framework bypasses the need for repetitive modeling or additional sampling, transforming conditional sensitivity analysis into a purely algebraic post-processing step. Numerical benchmarks indicate that the proposed method ensures physical coherence and offers superior numerical robustness and computational efficiency compared to conventional point-wise approaches.


Beyond Augmented-Action Surrogates for Multi-Expert Learning-to-Defer

Montreuil, Yannis, Carlier, Axel, Ng, Lai Xing, Ooi, Wei Tsang

arXiv.org Machine Learning

Existing multi-expert learning-to-defer surrogates are statistically consistent, yet they can underfit, suppress useful experts, or degrade as the expert pool grows. We trace these failures to a shared architectural choice: casting classes and experts as actions inside one augmented prediction geometry. Consistency governs the population target; it says nothing about how the surrogate distributes gradient mass during training. We analyze five surrogates along both axes and show that each trades a fix on one for a failure on the other. We then introduce a decoupled surrogate that estimates the class posterior with a softmax and each expert utility with an independent sigmoid. It admits an $\mathcal{H}$-consistency bound whose constant is $J$-independent for fixed per-expert weight $β{=}λ/J$, and its gradients are free of the amplification, starvation, and coupling pathologies of the augmented family. Experiments on synthetic benchmarks, CIFAR-10, CIFAR-10H, and Covertype confirm that the decoupled surrogate is the only method that avoids amplification under redundancy, preserves rare specialists, and consistently improves over a standalone classifier across all settings.


Learning-to-Defer with Expert-Conditioned Advice

Montreuil, Yannis, Montreuil, Leïna, Carlier, Axel, Ng, Lai Xing, Ooi, Wei Tsang

arXiv.org Machine Learning

Learning-to-Defer routes each input to the expert that minimizes expected cost, but it assumes that the information available to every expert is fixed at decision time. Many modern systems violate this assumption: after selecting an expert, one may also choose what additional information that expert should receive, such as retrieved documents, tool outputs, or escalation context. We study this problem and call it Learning-to-Defer with advice. We show that a broad family of natural separated surrogates, which learn routing and advice with distinct heads, is inconsistent even in the smallest non-trivial setting. We then introduce an augmented surrogate that operates on the composite expert--advice action space and prove an $\mathcal{H}$-consistency guarantee together with an excess-risk transfer bound, yielding recovery of the Bayes-optimal policy in the limit. Experiments on tabular, language, and multi-modal tasks show that the resulting method improves over standard Learning-to-Defer while adapting its advice-acquisition behavior to the cost regime; a synthetic benchmark confirms the failure mode predicted for separated surrogates.


Quantifying Learning Guarantees for Convex but Inconsistent Surrogates

Neural Information Processing Systems

We study consistency properties of machine learning methods based on minimizing convex surrogates. We extend the recent framework of Osokin et al. (2017) for the quantitative analysis of consistency properties to the case of inconsistent surrogates. Our key technical contribution consists in a new lower bound on the calibration function for the quadratic surrogate, which is non-trivial (not always zero) for inconsistent cases. The new bound allows to quantify the level of inconsistency of the setting and shows how learning with inconsistent surrogates can have guarantees on sample complexity and optimization difficulty. We apply our theory to two concrete cases: multi-class classification with the tree-structured loss and ranking with the mean average precision loss. The results show the approximation-computation trade-offs caused by inconsistent surrogates and their potential benefits.


Scalable Robust Matrix Factorization with Nonconvex Loss

Neural Information Processing Systems

Robust matrix factorization (RMF), which uses the $\ell_1$-loss, often outperforms standard matrix factorization using the $\ell_2$-loss, particularly when outliers are present. The state-of-the-art RMF solver is the RMF-MM algorithm, which, however, cannot utilize data sparsity. Moreover, sometimes even the (convex) $\ell_1$-loss is not robust enough. In this paper, we propose the use of nonconvex loss to enhance robustness. To address the resultant difficult optimization problem, we use majorization-minimization (MM) optimization and propose a new MM surrogate. To improve scalability, we exploit data sparsity and optimize the surrogate via its dual with the accelerated proximal gradient algorithm. The resultant algorithm has low time and space complexities and is guaranteed to converge to a critical point. Extensive experiments demonstrate its superiority over the state-of-the-art in terms of both accuracy and scalability.



A Broader Impact

Neural Information Processing Systems

Our work designs privacy attacks, which have the potential to cause harm. The main limitation of our work is the strong threat model under which our attacks work. All of our results on CIFAR-10 make use of fewer than 30000 trained models. We plot the effectiveness of Transfer LiRA in Figure 7. ROC curves for our student attacks are found Further qualitative examples can be found in Figure 9. Ablation of score information CIFAR-10 with duplicates are found in Figure 11. Distillation threat models, which we will consider simultaneously.