AITopics | selective inference

A data analysis pipeline is a structured sequence of steps that transforms raw data into meaningful insights by integrating multiple analysis algorithms. In many practical applications, analytical findings are obtained only after data pass through several data-dependent procedures within such pipelines. In this study, we address the problem of quantifying the statistical reliability of results produced by data analysis pipelines. As a proof of concept, we focus on clustering pipelines that identify cluster structures from complex and heterogeneous data through procedures such as outlier detection, feature selection, and clustering. We propose a novel statistical testing framework to assess the significance of clustering results obtained through these pipelines. Our framework, based on selective inference, enables the systematic construction of valid statistical tests for clustering pipelines composed of predefined components. We prove that the proposed test controls the type I error rate at any nominal level and demonstrate its validity and effectiveness through experiments on synthetic and real datasets.

artificial intelligence, data mining, machine learning, (18 more...)

arXiv.org Machine Learning

2603.18413

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (0.96)

Industry: Health & Medicine (1.00)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)

Add feedback

Selective inference for group-sparse linear models

Fan Yang, Rina Foygel Barber, Prateek Jain, John Lafferty

Neural Information Processing SystemsMar-23-2026, 13:39:26 GMT

Neural Information Processing Systems http://nips.cc/

artificial intelligence, inference, machine learning, (16 more...)

Neural Information Processing Systems

Country:

North America > United States (0.29)
Europe > Austria (0.28)

Genre: Research Report (0.97)

Industry: Health & Medicine (0.47)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.47)

Add feedback

cd706106802dbea2068efd7031c3b420-Paper-Conference.pdf

Neural Information Processing SystemsFeb-11-2026, 23:22:49 GMT

inference, segmentation result, selection event, (12 more...)

Neural Information Processing Systems

Genre: Research Report > Experimental Study (0.67)

Industry: Health & Medicine (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Statistical inference after variable selection in Cox models: A simulation study

Schemet, Lena, Friedrich-Welz, Sarah

arXiv.org Machine LearningFeb-10-2026

Choosing relevant predictors is central to the analysis of biomedical time-to-event data. Classical frequentist inference, however, presumes that the set of covariates is fixed in advance and does not account for data-driven variable selection. As a consequence, naive post-selection inference may be biased and misleading. In right-censored survival settings, these issues may be further exacerbated by the additional uncertainty induced by censoring. We investigate several inference procedures applied after variable selection for the coefficients of the Lasso and its extension, the adaptive Lasso, in the context of the Cox model. The methods considered include sample splitting, exact post-selection inference, and the debiased Lasso. Their performance is examined in a neutral simulation study reflecting realistic covariate structures and censoring rates commonly encountered in biomedical applications. To complement the simulation results, we illustrate the practical behavior of these procedures in an applied example using a publicly available survival dataset.

artificial intelligence, machine learning, modeling & simulation, (19 more...)

arXiv.org Machine Learning

2602.07477

Country:

Europe > Slovenia > Drava > Municipality of Benedikt > Benedikt (0.04)
North America > United States > New York > New York County > New York City (0.04)
North America > United States > Florida > Palm Beach County > Boca Raton (0.04)
Europe > Germany (0.04)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry:

Health & Medicine > Therapeutic Area > Oncology (0.93)
Law > Civil Rights & Constitutional Law (0.78)

Technology:

Information Technology > Modeling & Simulation (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.97)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.68)

Add feedback

Quantifying Statistical Significance of Neural Network-based Image Segmentation by Selective Inference

Neural Information Processing SystemsDec-25-2025, 07:32:41 GMT

Although a vast body of literature relates to image segmentation methods that use deep neural networks (DNNs), less attention has been paid to assessing the statistical reliability of segmentation results. In this study, we interpret the segmentation results as hypotheses driven by DNN (called DNN-driven hypotheses) and propose a method to quantify the reliability of these hypotheses within a statistical hypothesis testing framework. To this end, we introduce a conditional selective inference (SI) framework---a new statistical inference framework for data-driven hypotheses that has recently received considerable attention---to compute exact (non-asymptotic) valid p-values for the segmentation results. To use the conditional SI framework for DNN-based segmentation, we develop a new SI algorithm based on the homotopy method, which enables us to derive the exact (non-asymptotic) sampling distribution of DNN-driven hypothesis. We conduct several experiments to demonstrate the performance of the proposed method.

hypothesis, neural network-based image segmentation, quantifying statistical significance, (6 more...)

Neural Information Processing Systems

Genre: Research Report > Experimental Study (0.41)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.65)

Add feedback

Computing Valid p-value for Optimal Changepoint by Selective Inference using Dynamic Programming

Neural Information Processing SystemsDec-24-2025, 06:06:06 GMT

Although there is a vast body of literature related to methods for detecting change-points (CPs), less attention has been paid to assessing the statistical reliability of the detected CPs. In this paper, we introduce a novel method to perform statistical inference on the significance of the CPs, estimated by a Dynamic Programming (DP)-based optimal CP detection algorithm. Our main idea is to employ a Selective Inference (SI) approach---a new statistical inference framework that has recently received a lot of attention---to compute exact (non-asymptotic) valid p-values for the detected optimal CPs. Although it is well-known that SI has low statistical power because of over-conditioning, we address this drawback by introducing a novel method called parametric DP, which enables SI to be conducted with the minimum amount of conditioning, leading to high statistical power. We conduct experiments on both synthetic and real-world datasets, through which we offer evidence that our proposed method is more powerful than existing methods, has decent performance in terms of computational efficiency, and provides good results in many practical applications.

computing valid p-value, dynamic programming, selective inference, (7 more...)

Neural Information Processing Systems

Genre: Research Report (0.87)

Technology: Information Technology > Artificial Intelligence (0.43)

Add feedback

$ϕ$-test: Global Feature Selection and Inference for Shapley Additive Explanations

Kim, Dongseok, Choi, Hyoungsun, Rasool, Mohamed Jismy Aashik, Oh, Gisung

arXiv.org Machine LearningDec-9-2025

We propose $ϕ$-test, a global feature-selection and significance procedure for black-box predictors that combines Shapley attributions with selective inference. Given a trained model and an evaluation dataset, $ϕ$-test performs SHAP-guided screening and fits a linear surrogate on the screened features via a selection rule with a tractable selective-inference form. For each retained feature, it outputs a Shapley-based global score, a surrogate coefficient, and post-selection $p$-values and confidence intervals in a global feature-importance table. Experiments on real tabular regression tasks with tree-based and neural backbones suggest that $ϕ$-test can retain much of the predictive ability of the original model while using only a few features and producing feature sets that remain fairly stable across resamples and backbone classes. In these settings, $ϕ$-test acts as a practical global explanation layer linking Shapley-based importance summaries with classical statistical inference.

confidence interval, explanation, global feature selection and inference, (9 more...)

arXiv.org Machine Learning

2512.07578

Genre: Research Report > Experimental Study (0.70)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)

Add feedback

Selective inference for group-sparse linear models

Neural Information Processing SystemsNov-21-2025, 14:53:37 GMT

We develop tools for selective inference in the setting of group sparsity, including the construction of confidence intervals and p-values for testing selected groups of variables. Our main technical result gives the precise distribution of the magnitude of the projection of the data onto a given subspace, and enables us to develop inference procedures for a broad class of group-sparse selection methods, including the group lasso, iterative hard thresholding, and forward stepwise regression. We give numerical results to illustrate these tools on simulated data and on health record data.

group-sparse linear model, name change, selective inference, (1 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.62)

Add feedback

Decoding Memories: An Efficient Pipeline for Self-Consistency Hallucination Detection

Gao, Weizhi, Liu, Xiaorui, Wang, Feiyi, Lu, Dan, Yin, Junqi

arXiv.org Artificial IntelligenceSep-1-2025

Large language models (LLMs) have demonstrated impressive performance in both research and real-world applications, but they still struggle with hallucination. Existing hallucination detection methods often perform poorly on sentence-level generation or rely heavily on domain-specific knowledge. While self-consistency approaches help address these limitations, they incur high computational costs due to repeated generation. In this paper, we conduct the first study on identifying redundancy in self-consistency methods, manifested as shared prefix tokens across generations, and observe that non-exact-answer tokens contribute minimally to the semantic content. Based on these insights, we propose a novel Decoding Memory Pipeline (DMP) that accelerates generation through selective inference and annealed decoding. Being orthogonal to the model, dataset, decoding strategy, and self-consistency baseline, our DMP consistently improves the efficiency of multi-response generation and holds promise for extension to alignment and reasoning tasks. Extensive experiments show that our method achieves up to a 3x speedup without sacrificing AUROC performance.

arxiv preprint arxiv, large language model, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2508.21228

Country: North America > United States (0.46)

Genre: Research Report > New Finding (0.46)

Industry: Government > Regional Government (0.46)

Technology: