distribution function
- North America > Canada > Quebec > Montreal (0.04)
- Asia > Middle East > Jordan (0.04)
- Research Report > Experimental Study (1.00)
- Research Report > New Finding (0.67)
- North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
- Asia > China > Shandong Province > Qingdao (0.04)
- Asia > China > Hubei Province > Wuhan (0.04)
- North America > Canada (0.04)
- North America > Canada > Ontario > Toronto (0.04)
- North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
- Oceania > Australia (0.04)
- North America > United States > New Jersey > Hudson County > Hoboken (0.04)
- (5 more...)
- Information Technology > Modeling & Simulation (1.00)
- Information Technology > Data Science > Data Mining (1.00)
- Information Technology > Artificial Intelligence > Machine Learning (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.46)
- Europe > Spain > Andalusia > Cádiz Province > Cadiz (0.04)
- Europe > Germany > Berlin (0.04)
- North America > United States > Wisconsin > Dane County > Madison (0.04)
- (4 more...)
- North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.05)
- North America > United States > New York > New York County > New York City (0.04)
- North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
Minimum Wasserstein distance estimator under covariate shift: closed-form, super-efficiency and irregularity
Lang, Junjun, Zhang, Qiong, Liu, Yukun
Covariate shift arises when covariate distributions differ between source and target populations while the conditional distribution of the response remains invariant, and it underlies problems in missing data and causal inference. We propose a minimum Wasserstein distance estimation framework for inference under covariate shift that avoids explicit modeling of outcome regressions or importance weights. The resulting W-estimator admits a closed-form expression and is numerically equivalent to the classical 1-nearest neighbor estimator, yielding a new optimal transport interpretation of nearest neighbor methods. We establish root-$n$ asymptotic normality and show that the estimator is not asymptotically linear, leading to super-efficiency relative to the semiparametric efficient estimator under covariate shift in certain regimes, and uniformly in missing data problems. Numerical simulations, along with an analysis of a rainfall dataset, underscore the exceptional performance of our W-estimator.
- Asia > Bangladesh (0.04)
- North America > United States > New York (0.04)
- North America > United States > Wisconsin > Dane County > Madison (0.04)
- (3 more...)
How Much Data Is Enough? Uniform Convergence Bounds for Generative & Vision-Language Models under Low-Dimensional Structure
Modern generative and vision-language models (VLMs) are increasingly used in scientific and medical decision support, where predicted probabilities must be both accurate and well calibrated. Despite strong empirical results with moderate data, it remains unclear when such predictions generalize uniformly across inputs, classes, or subpopulations, rather than only on average-a critical issue in biomedicine, where rare conditions and specific groups can exhibit large errors even when overall loss is low. We study this question from a finite-sample perspective and ask: under what structural assumptions can generative and VLM-based predictors achieve uniformly accurate and calibrated behavior with practical sample sizes? Rather than analyzing arbitrary parameterizations, we focus on induced families of classifiers obtained by varying prompts or semantic embeddings within a restricted representation space. When model outputs depend smoothly on a low-dimensional semantic representation-an assumption supported by spectral structure in text and joint image-text embeddings-classical uniform convergence tools yield meaningful non-asymptotic guarantees. Our main results give finite-sample uniform convergence bounds for accuracy and calibration functionals of VLM-induced classifiers under Lipschitz stability with respect to prompt embeddings. The implied sample complexity depends on intrinsic/effective dimension, not ambient embedding dimension, and we further derive spectrum-dependent bounds that make explicit how eigenvalue decay governs data requirements. We conclude with implications for data-limited biomedical modeling, including when current dataset sizes can support uniformly reliable predictions and why average calibration metrics may miss worst-case miscalibration.
- Information Technology > Artificial Intelligence > Vision (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Natural Language (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.94)