partition
Adaptive Clustering through Semidefinite Programming
We analyze the clustering problem through a flexible probabilistic model that aims to identify an optimal partition on the sample X1,...,Xn. We perform exact clustering with high probability using a convex semidefinite estimator that interprets as a corrected, relaxed version of K-means. The estimator is analyzed through a non-asymptotic framework and showed to be optimal or near-optimal in recovering the partition. Furthermore, its performances are shown to be adaptive to the problem's effective dimension, as well as to K the unknown number of groups in this partition. We illustrate the method's performances in comparison to other classical clustering algorithms with numerical experiments on simulated high-dimensional data.
- Oceania > Australia > New South Wales (0.04)
- Asia > Middle East > Jordan (0.04)
- North America > Canada > Quebec > Montreal (0.04)
- Asia > China > Shanghai > Shanghai (0.04)
- Information Technology > Communications > Social Media (0.95)
- Information Technology > Data Science (0.89)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.68)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.46)
LoBoost: Fast Model-Native Local Conformal Prediction for Gradient-Boosted Trees
Santos, Vagner, Coscrato, Victor, Cabezas, Luben, Izbicki, Rafael, Ramos, Thiago
Gradient-boosted decision trees are among the strongest off-the-shelf predictors for tabular regression, but point predictions alone do not quantify uncertainty. Conformal prediction provides distribution-free marginal coverage, yet split conformal uses a single global residual quantile and can be poorly adaptive under heteroscedasticity. Methods that improve adaptivity typically fit auxiliary nuisance models or introduce additional data splits/partitions to learn the conformal score, increasing cost and reducing data efficiency. We propose LoBoost, a model-native local conformal method that reuses the fitted ensemble's leaf structure to define multiscale calibration groups. Each input is encoded by its sequence of visited leaves; at resolution level k, we group points by matching prefixes of leaf indices across the first k trees and calibrate residual quantiles within each group. LoBoost requires no retraining, auxiliary models, or extra splitting beyond the standard train/calibration split. Experiments show competitive interval quality, improved test MSE on most datasets, and large calibration speedups.
- South America > Brazil > São Paulo (0.04)
- Europe > France > Auvergne-Rhône-Alpes > Isère > Grenoble (0.04)
- North America > United States > Massachusetts (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Low-degree Lower bounds for clustering in moderate dimension
Carpentier, Alexandra, Verzelen, Nicolas
We study the fundamental problem of clustering $n$ points into $K$ groups drawn from a mixture of isotropic Gaussians in $\mathbb{R}^d$. Specifically, we investigate the requisite minimal distance $Δ$ between mean vectors to partially recover the underlying partition. While the minimax-optimal threshold for $Δ$ is well-established, a significant gap exists between this information-theoretic limit and the performance of known polynomial-time procedures. Although this gap was recently characterized in the high-dimensional regime ($n \leq dK$), it remains largely unexplored in the moderate-dimensional regime ($n \geq dK$). In this manuscript, we address this regime by establishing a new low-degree polynomial lower bound for the moderate-dimensional case when $d \geq K$. We show that while the difficulty of clustering for $n \leq dK$ is primarily driven by dimension reduction and spectral methods, the moderate-dimensional regime involves more delicate phenomena leading to a "non-parametric rate". We provide a novel non-spectral algorithm matching this rate, shedding new light on the computational limits of the clustering problem in moderate dimension.
- North America > United States (0.14)
- Europe > France > Occitanie > Hérault > Montpellier (0.04)
- Europe > Italy > Lazio > Rome (0.04)
- (3 more...)
- Asia > China (0.04)
- North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
- North America > Canada (0.04)
- North America > Canada > British Columbia (0.04)
- Asia > Japan > Honshū > Chūgoku > Hiroshima Prefecture > Hiroshima (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Europe > Monaco (0.04)
- Europe > Italy > Calabria (0.04)
- (2 more...)
LimitstoDepth-EfficienciesofSelf-Attention
Self-attention architectures, which are rapidly pushing the frontier innatural language processing, demonstrate asurprising depth-inefficient behavior: previous works indicate that increasing the internal representation (network width) isjust as useful as increasing the number of self-attention layers (network depth).
- North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
- North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.14)
- North America > United States > Missouri > Boone County > Columbia (0.13)
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.13)
- Europe > Switzerland > Basel-City > Basel (0.04)
- (2 more...)
- Research Report (1.00)
- Workflow (0.67)