Goto

Collaborating Authors

 proposition 7



Conformal Robust Set Estimation

arXiv.org Machine Learning

Conformal prediction provides finite-sample, distribution-free coverage under exchangeability, but standard constructions may lack robustness in the presence of outliers or heavy tails. We propose a robust conformal method based on a non-conformity score defined as the half-mass radius around a point, equivalently the distance to its $(\lfloor n/2\rfloor+1)$-nearest neighbour. We show that the resulting conformal regions are marginally valid for any sample size and converge in probability to a robust population central set defined through a distance-to-a-measure functional. Under mild regularity conditions, we establish exponential concentration and tail bounds that quantify the deviation between the empirical conformal region and its population counterpart. These results provide a probabilistic justification for using robust geometric scores in conformal prediction, even for heavy-tailed or multi-modal distributions.


Beyond Augmented-Action Surrogates for Multi-Expert Learning-to-Defer

arXiv.org Machine Learning

Existing multi-expert learning-to-defer surrogates are statistically consistent, yet they can underfit, suppress useful experts, or degrade as the expert pool grows. We trace these failures to a shared architectural choice: casting classes and experts as actions inside one augmented prediction geometry. Consistency governs the population target; it says nothing about how the surrogate distributes gradient mass during training. We analyze five surrogates along both axes and show that each trades a fix on one for a failure on the other. We then introduce a decoupled surrogate that estimates the class posterior with a softmax and each expert utility with an independent sigmoid. It admits an $\mathcal{H}$-consistency bound whose constant is $J$-independent for fixed per-expert weight $β{=}λ/J$, and its gradients are free of the amplification, starvation, and coupling pathologies of the augmented family. Experiments on synthetic benchmarks, CIFAR-10, CIFAR-10H, and Covertype confirm that the decoupled surrogate is the only method that avoids amplification under redundancy, preserves rare specialists, and consistently improves over a standalone classifier across all settings.




Fast Sampling for Flows and Diffusions with Lazy and Point Mass Stochastic Interpolants

arXiv.org Machine Learning

Stochastic interpolants unify flows and diffusions, popular generative modeling frameworks. A primary hyperparameter in these methods is the interpolation schedule that determines how to bridge a standard Gaussian base measure to an arbitrary target measure. We prove how to convert a sample path of a stochastic differential equation (SDE) with arbitrary diffusion coefficient under any schedule into the unique sample path under another arbitrary schedule and diffusion coefficient. We then extend the stochastic interpolant framework to admit a larger class of point mass schedules in which the Gaussian base measure collapses to a point mass measure. Under the assumption of Gaussian data, we identify lazy schedule families that make the drift identically zero and show that with deterministic sampling one gets a variance-preserving schedule commonly used in diffusion models, whereas with statistically optimal SDE sampling one gets our point mass schedule. Finally, to demonstrate the usefulness of our theoretical results on realistic highly non-Gaussian data, we apply our lazy schedule conversion to a state-of-the-art pretrained flow model and show that this allows for generating images in fewer steps without retraining the model.





When are Kalman-Filter Restless Bandits Indexable?

Neural Information Processing Systems

We study the restless bandit associated with an extremely simple scalar Kalman filter model in discrete time. Under certain assumptions, we prove that the problem is indexable in the sense that the Whittle index is a non-decreasing function of the relevant belief state. In spite of the long history of this problem, this appears to be the first such proof. We use results about Schur-convexity and mechanical words, which are particular binary strings intimately related to palindromes.