cdsplit
CD-split: efficient conformal regions in high dimensions
Izbicki, Rafael, Shimizu, Gilson, Stern, Rafael B.
Conformal methods create prediction bands that control average coverage assuming solely i.i.d. data. Although the literature has mostly focused on prediction intervals, more general regions can often better represent uncertainty. For instance, a bimodal target is better represented by the union of two intervals. Such prediction regions are obtained by CD-split, which combines the split method and a data-driven partition of the feature space which scales to high dimensions. In this paper, we provide new theoretical properties and simulations related to CD-split. We show that CD-split converges asymptotically to the oracle highest density set. In particular, we show that CD-split satisfies local and asymptotic conditional validity. We also present many new simulations, which show how to tune CD-split and compare it to other methods in the literature. In a wide variety of these simulations, CD-split has a better conditional coverage and yields smaller prediction regions than other methods.
- Oceania > Australia > Western Australia > Perth (0.04)
- North America > United States > Illinois > Cook County > Chicago (0.04)
Distribution-free conditional predictive bands using density estimators
Izbicki, Rafael, Shimizu, Gilson T., Stern, Rafael B.
Conformal methods create prediction bands that control average coverage under no assumptions besides i.i.d. data. Besides average coverage, one might also desire to control conditional coverage, that is, coverage for every new testing point. However, without strong assumptions, conditional coverage is unachievable. Given this limitation, the literature has focused on methods with asymptotical conditional coverage. In order to obtain this property, these methods require strong conditions on the dependence between the target variable and the features. We introduce two conformal methods based on conditional density estimators that do not depend on this type of assumption to obtain asymptotic conditional coverage: Dist-split and CD-split. While Dist-split asymptotically obtains optimal intervals, which are easier to interpret than general regions, CD-split obtains optimal size regions, which are smaller than intervals. CD-split also obtains local coverage by creating a data-driven partition of the feature space that scales to high-dimensional settings and by generating prediction bands locally on the partition elements. In a wide variety of simulated scenarios, our methods have a better control of conditional coverage and have smaller length than previously proposed methods.
- South America > Brazil > São Paulo (0.04)
- North America > United States > Illinois > Cook County > Chicago (0.04)