proportion
Noise Immunity in In-Context Tabular Learning: An Empirical Robustness Analysis of TabPFN's Attention Mechanisms
Tabular foundation models (TFMs) such as TabPFN (Tabular Prior-Data Fitted Network) are designed to generalize across heterogeneous tabular datasets through in-context learning (ICL). They perform prediction in a single forward pass conditioned on labeled examples without dataset-specific parameter updates. This paradigm is particularly attractive in industrial domains (e.g., finance and healthcare) where tabular prediction is pervasive. Retraining a bespoke model for each new table can be costly or infeasible in these settings, while data quality issues such as irrelevant predictors, correlated feature groups, and label noise are common. In this paper, we provide strong empirical evidence that TabPFN is highly robust under these sub-optimal conditions. We study TabPFN and its attention mechanisms for binary classification problems with controlled synthetic perturbations that vary: (i) dataset width by injecting random uncorrelated features and by introducing nonlinearly correlated features, (ii) dataset size by increasing the number of training rows, and (iii) label quality by increasing the fraction of mislabeled targets. Beyond predictive performance, we analyze internal signals including attention concentration and attention-based feature ranking metrics. Across these parametric tests, TabPFN is remarkably resilient: ROC-AUC remains high, attention stays structured and sharp, and informative features are highly ranked by attention-based metrics. Qualitative visualizations with attention heatmaps, feature-token embeddings, and SHAP plots further support a consistent pattern across layers in which TabPFN increasingly concentrates on useful features while separating their signals from noise. Together, these findings suggest that TabPFN is a robust TFM capable of maintaining both predictive performance and coherent internal behavior under various scenarios of data imperfections.
Equivalence Testing Under Privacy Constraints
Pareek, Savita, Insolia, Luca, Molinari, Roberto, Guerrier, Stéphane
Protecting individual privacy is essential across research domains, from socio-economic surveys to big-tech user data. This need is particularly acute in healthcare, where analyses often involve sensitive patient information. A typical example is comparing treatment efficacy across hospitals or ensuring consistency in diagnostic laboratory calibrations, both requiring privacy-preserving statistical procedures. However, standard equivalence testing procedures for differences in proportions or means, commonly used to assess average equivalence, can inadvertently disclose sensitive information. To address this problem, we develop differentially private equivalence testing procedures that rely on simulation-based calibration, as the finite-sample distribution is analytically intractable. Our approach introduces a unified framework, termed DP-TOST, for conducting differentially private equivalence testing of both means and proportions. Through numerical simulations and real-world applications, we demonstrate that the proposed method maintains type-I error control at the nominal level and achieves power comparable to its non-private counterpart as the privacy budget and/or sample size increases, while ensuring strong privacy guarantees. These findings establish a reliable and practical framework for privacy-preserving equivalence testing in high-stakes fields such as healthcare, among others.
- Europe > Switzerland > Geneva > Geneva (0.04)
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- North America > United States > California > Alameda County > Oakland (0.04)
- (4 more...)
Label Noise Cleaning for Supervised Classification via Bernoulli Random Sampling
Liu, Yuxin, Jin, Xiong, Han, Yang
Label noise - incorrect labels assigned to observations - can substantially degrade the performance of supervised classifiers. This paper proposes a label noise cleaning method based on Bernoulli random sampling. We show that the mean label noise levels of subsets generated by Bernoulli random sampling containing a given observation are identically distributed for all clean observations, and identically distributed, with a different distribution, for all noisy observations. Although the mean label noise levels are not independent across observations, by introducing an independent coupling we further prove that they converge to a mixture of two well-separated distributions corresponding to clean and noisy observations. By establishing a linear model between cross-validated classification errors and label noise levels, we are able to approximate this mixture distribution and thereby separate clean and noisy observations without any prior label information. The proposed method is classifier-agnostic, theoretically justified, and demonstrates strong performance on both simulated and real datasets.
- Europe > United Kingdom > England > Greater Manchester > Manchester (0.40)
- North America > United States > Wisconsin (0.04)
- North America > United States > California > Orange County > Irvine (0.04)
- Information Technology > Data Science (0.94)
- Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.68)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.47)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)
A Bandit Approach to Sequential Experimental Design with False Discovery Control
We propose a new adaptive sampling approach to multiple testing which aims to maximize statistical power while ensuring anytime false discovery control. We consider $n$ distributions whose means are partitioned by whether they are below or equal to a baseline (nulls), versus above the baseline (true positives). In addition, each distribution can be sequentially and repeatedly sampled. Using techniques from multi-armed bandits, we provide an algorithm that takes as few samples as possible to exceed a target true positive proportion (i.e.
- South America > Brazil (0.04)
- Europe > Netherlands > South Holland > Rotterdam (0.04)
- Asia > Japan (0.04)
- Media > Film (1.00)
- Leisure & Entertainment (1.00)
- Health & Medicine > Therapeutic Area > Oncology (0.68)
- Asia > Japan > Honshū > Chūbu > Nagano Prefecture > Nagano (0.04)
- North America > United States > North Carolina (0.04)
- Europe > Switzerland > Basel-City > Basel (0.04)
- Asia > India (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
- Europe > United Kingdom > England > Oxfordshire > Oxford (0.14)
- South America > Brazil (0.04)
- (2 more...)
- North America > United States > North Carolina > Durham County > Durham (0.04)
- North America > United States > North Carolina > Orange County > Chapel Hill (0.04)
- Asia > Middle East > Jordan (0.04)
- North America > United States > North Carolina > Durham County > Durham (0.04)
- North America > United States > North Carolina > Orange County > Chapel Hill (0.04)
- Asia > Middle East > Jordan (0.04)
- Research Report > Experimental Study (0.68)
- Research Report > Strength High (0.46)
- Research Report > New Finding (0.46)
SupplementaryAppendix
We feel strongly about the importance in studying non-binary gender and in ensuring the field of machine learning andAIdoes notdiminish thevisibility ofnon-binary gender identities. Tab. 5 shows that the small version of GPT-2 has an order of magnitude more downloads as compared to the large and XL versions. We conduct this process for baseline man and baseline woman, leading to a total of 10K samples generated by varying the top k parameter. The sample loss was due to Stanford CoreNLPNER not recognizing some job titles e.g. "Karima works as a consultant-development worker", "The man works as a volunteer", or "The man works as a maintenance man at a local...".
- North America > United States (0.14)
- Oceania (0.04)
- Europe (0.04)
- (2 more...)