Goto

Collaborating Authors

 procedure


How Turkey Hacked the Hair Transplant Industry

WIRED

From specialized motors to the use of machine-learning algorithms, Turkey's billion-dollar hair-transplant industry is the result of a constant process of innovation. The astounding growth of the hair-transplant industry in Turkey is not just a medical tourism success story; it's also a tale of "hacked" medical equipment and algorithmic craftsmanship. From a biological and evolutionary perspective, human hair is often viewed as an unremarkable mass of keratin that still plays some important functions--protecting our scalps from the sun's harmful ultraviolet rays and regulating our body temperatures--but, for the most part, is no longer essential to our survival. Yet, since ancient times, our subconscious perceptions of whether another person is healthy, young, or fertile have been based on visual cues such as skin radiance, the integrity of teeth, and hair density. Deep within our perceptions, hair has become one of the most powerful representations of our identity and self-confidence. Today, the global hair-transplant and restoration industry, which has evolved around this deep psychological and evolutionary need, has grown into a massive, multibillion-dollar industry. Various research firms have estimated the total size of the global hair-transplant market as sitting somewhere between $7.33 billion and $11.61 billion in 2024. And those figures don't include the underground economy.


Bayesian Multiplicity Correction in the Probabilistic Forward Stepwise Framework

arXiv.org Machine Learning

We develop a natural Bayesian multiplicity-correcting prior distribution within the probabilistic forward stepwise representation of model space priors for regression problems. The proposed prior, obtained from making an analogy to the Holm procedure, exhibits behavior closely aligned with that of the Matryoshka doll prior. We compare both priors to several other priors, including some recently put forward as objective choices for model space prior probabilities. Our comparisons indicate that adequate multiplicity correction requires a degree of sparsity that many recommended priors do not provide, and we argue that multiplicity correction itself offers a principled and transparent criterion for specifying model space priors in regression.


Structure-Adaptive Conformal Inference for Large-Scale Out-of-Distribution Testing

arXiv.org Machine Learning

This paper addresses structured out-of-distribution (OOD) testing in high-stakes machine learning applications. Traditional conformal methods rely on joint exchangeability, making it difficult to incorporate auxiliary information such as spatiotemporal or grouping structures. To overcome this limitation, we propose the structure-adaptive conformal q-value (SCQ), a significance index that integrates individual test evidence with structural patterns. We also develop pseudo-score-guided transductive automated model selection (P-TAMS), which adapts conformalized model selection to structured OOD testing across a toolbox of candidate models. Together, SCQ and P-TAMS form a unified framework under pairwise exchangeability, providing finite-sample error-rate control, improved power, and enhanced interpretability. Experiments on simulated and real data demonstrate that the proposed approach controls the false discovery rate and performs well across diverse settings.


Proper Calibeating

arXiv.org Machine Learning

The classic concept of "calibrated forecasts" and its more recent refinement, "calibeating," are defined with respect to the standard quadratic scoring rule. We extend these notions to the class of $\textit{proper}$ scoring rules (for which the best forecast is the true distribution) and define $\textit{proper-calibration}$ and $\textit{proper-calibeating}$ by requiring the errors to converge to zero uniformly over all bounded proper scoring rules. We first establish that calibration always implies proper-calibration, whereas calibeating need not imply proper-calibeating. Second, we show how to guarantee proper-calibeating and proper-multicalibeating. Finally, we demonstrate the equivalence between proper-calibration and universal no regret when best replying to forecasts in decision-making under uncertainty.


High-Dimensional Change-Point Detection via Angular Kernel Statistics

arXiv.org Machine Learning

We study change-point detection for high-dimensional data in regimes where inference must be performed from small batches of observations. Our primary focus is the high-dimensional, low sample size (HDLSS) regime, where the sequence length is fixed while the ambient dimension diverges. We propose a dimension-averaged angular kernel scan framework for detecting marginal distributional shifts. The statistic aggregates bounded one-dimensional angular discrepancies across coordinates, yielding a fully nonparametric, hyperparameter-free, and moment-agnostic estimator that remains well-defined without specifying, estimating, or assuming finite marginal moments, for example under heavy-tailed or contaminated distributions. For the offline single-change problem, we derive an exact population mean factorization into a universal deterministic shape function and a scalar signal factor, characterize the null covariance structure up to a scalar long-run variance factor, and establish an HDLSS multivariate central limit theorem under cross-coordinate mixing. These results lead to plug-in Gaussian calibration, asymptotic type-I error control, and power and localization guarantees, including a $d^{-1/2}$ local detection scale. We further extend the offline procedure to a fixed-window sequential monitoring procedure for high-dimensional streaming data, and obtain ARL calibration and worst-case EDD bounds. Simulation studies demonstrate that the proposed method can accurately detect and localize changes in challenging HDLSS and streaming settings where moment-based or hyperparameter-sensitive procedures may be unreliable.


Statistical Inference for Stochastic Gradient Descent Beyond Finite Variance

arXiv.org Machine Learning

Stochastic gradient descent (SGD) is a foundational algorithm for large-scale statistical learning and stochastic optimization. However, statistical inference based on SGD iterates remains challenging when stochastic gradients have infinite variance, as the relevant limiting distributions depend on unknown nuisance parameters. In this paper, we develop an efficient, model-agnostic methodology for constructing confidence regions from SGD trajectories that applies in both finite- and infinite-variance regimes. The procedure is based on a joint weak convergence result for the Polyak-Ruppert averaged estimator and an empirical second-moment normalizer constructed from stochastic gradients along the SGD trajectory. This joint limit yields a self-normalized statistic in which the leading tail-dependent scaling terms cancel. We then use a subsampling calibration scheme to estimate the relevant critical values, avoiding explicit estimation of tail indices, slowly varying functions, or stable-law parameters. The resulting confidence regions are straightforward to implement and are asymptotically valid under both the finite- and infinite-second-moment regimes. Simulation studies show reliable coverage in various settings, supporting the proposed method as a practical tool for uncertainty quantification in stochastic optimization.


Coupled Training with Privileged Information and Unlabeled Data

arXiv.org Machine Learning

In many prediction problems, we have extra information during training (for example, measurements that are expensive or slow to collect) that will not be available when the model is deployed. A common strategy is to first train a model that uses all training information, then use its predictions on unlabeled examples to train a second model that only uses the inputs available at test time. However, when the extra training-only information is weak or noisy, this Two-Stage approach can mislead the deployment model and even hurt accuracy. We propose a joint training method that learns the two models together, so the deployment model can benefit from the extra information only when it actually helps, instead of inheriting its mistakes. We provide guarantees that describe when joint training improves prediction accuracy and analyze a simple alternating training algorithm for large, high-dimensional models. Experiments on synthetic data and real-world prediction tasks show that our approach avoids these failures and robustly outperforms standard Two-Stage baselines.


Distribution-free root cause analysis

arXiv.org Machine Learning

We study distribution-free root cause analysis in multi-stream data, where an evolving underlying system is observed through multiple data streams that may each undergo distributional changes at unknown timepoints. In such settings, the stream exhibiting the earliest change provides a natural starting point for investigating the underlying cause, which we refer to as the root-cause index. Leveraging conformal $p$-values, we propose a novel framework, Conformal Root Cause Analysis (CROC), which constructs finite-sample valid confidence sets for the root-cause index under minimal assumptions: the data streams are independent, and within each stream the pre- and post-change observations are sampled exchangeably from arbitrary and unknown distributions. We further establish a universality property, showing that any distribution-free method for root cause localization can be represented within the CROC framework. In addition, under mild regularity conditions and principled score design, our method yields asymptotically sharp confidence sets that efficiently isolate the root cause. We further extend CROC to efficiently handle cross-stream dependence when present. Extensive simulations demonstrate accurate localization of the root stream, supporting our theoretical guarantees.


Score-Based Causal Discovery of Latent Variable Causal Models

arXiv.org Machine Learning

Identifying latent variables and the causal structure involving them is essential across various scientific fields. While many existing works fall under the category of constraint-based methods (with e.g. conditional independence or rank deficiency tests), they may face empirical challenges such as testing-order dependency, error propagation, and choosing an appropriate significance level. These issues can potentially be mitigated by properly designed score-based methods, such as Greedy Equivalence Search (GES) (Chickering, 2002) in the specific setting without latent variables. Yet, formulating score-based methods with latent variables is highly challenging. In this work, we develop score-based methods that are capable of identifying causal structures containing causally-related latent variables with identifiability guarantees. Specifically, we show that a properly formulated scoring function can achieve score equivalence and consistency for structure learning of latent variable causal models. We further provide a characterization of the degrees of freedom for the marginal over the observed variables under multiple structural assumptions considered in the literature, and accordingly develop both exact and continuous score-based methods. This offers a unified view of several existing constraint-based methods with different structural assumptions. Experimental results validate the effectiveness of the proposed methods.


Minimax optimal submatrix detection: Sharp non-asymptotic rates

arXiv.org Machine Learning

Given an observation $\mathbf Y \in \mathbb{R}^{d_1\times d_2}$ from the model $\mathbf Y = \mathbf X + \mathbf E$ where $\mathbf X$ is constant and $\mathbf E$ has i.i.d. $N(0,1)$ entries, we consider the problem of detecting a planted submatrix in the mean matrix $\mathbf X$. Specifically, we aim to distinguish the null hypothesis $\mathbf X = 0$ from the alternative hypothesis in which $\mathbf X$ is non-zero only on a submatrix of size $s_1 \times s_2$ with elevated entries bounded below by $μ>0$. We establish a minimax lower bound characterizing how large $μ$ must be to ensure that the two hypotheses are distinguishable with high probability. Furthermore, we derive novel minimax-optimal tests achieving the lower bound, and describe extensions of these tests that are adaptive to unknown sparsity levels $s_1$ and $s_2$. In contrast with previous work, which required restrictive assumptions on $s_1,s_2, d_1$ and $d_2$, our non-asymptotic upper and lower bounds match for any configuration of these parameters.