AITopics | data splitting

44f683a84163b3523afe57c2e008bc8c-Supplemental.pdf

Neural Information Processing SystemsFeb-8-2026, 06:27:45 GMT

constraint, test statistic, test statistics, (15 more...)

Neural Information Processing Systems

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
North America > Canada (0.04)
Europe > Germany > Baden-Württemberg > Tübingen Region > Tübingen (0.04)
Asia > Japan > Honshū > Kantō > Kanagawa Prefecture (0.04)

Genre: Research Report > Experimental Study (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.67)

Add feedback

LearningKernelTestsWithoutDataSplitting

Neural Information Processing SystemsFeb-8-2026, 06:27:38 GMT

Inspired by the selectiveinference framework, we propose an approach that enables learning the hyperparameters and testing onthefullsample without data splitting.

artificial intelligence, machine learning, test statistics, (13 more...)

Neural Information Processing Systems

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
Europe > Germany > Baden-Württemberg > Tübingen Region > Tübingen (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.46)

Add feedback

MMD-Fuse: Learning and Combining Kernels for Two-Sample Testing Without Data Splitting

Neural Information Processing SystemsDec-27-2025, 03:48:31 GMT

We propose novel statistics which maximise the power of a two-sample test based on the Maximum Mean Discrepancy (MMD), byadapting over the set of kernels used in defining it.For finite sets, this reduces to combining (normalised) MMD values under each of these kernels via a weighted soft maximum.Exponential concentration bounds are proved for our proposed statistics under the null and alternative.We further show how these kernels can be chosen in a data-dependent but permutation-independent way, in a well-calibrated test, avoiding data splitting.This technique applies more broadly to general permutation-based MMD testing, and includes the use of deep kernels with features learnt using unsupervised models such as auto-encoders.We highlight the applicability of our MMD-Fuse tests on both synthetic low-dimensional and real-world high-dimensional data, and compare its performance in terms of power against current state-of-the-art kernel tests.

learning and combining kernel, name change, two-sample testing, (4 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.41)

Add feedback

44f683a84163b3523afe57c2e008bc8c-Supplemental.pdf

Neural Information Processing SystemsOct-2-2025, 19:27:21 GMT

artificial intelligence, machine learning, test statistic, (17 more...)

Neural Information Processing Systems

Country: North America > United States (0.28)

Genre: Research Report > Experimental Study (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.93)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.67)

Add feedback

Learning Kernel Tests Without Data Splitting

Neural Information Processing SystemsOct-2-2025, 19:27:13 GMT

At the same significance level, our approach's test power is empirically

artificial intelligence, machine learning, test statistics, (13 more...)

Neural Information Processing Systems

Country: North America > United States (0.28)

Genre: Research Report (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.93)

Add feedback

Optimal Ridge Detection using Coverage Risk

Yen-Chi Chen, Christopher R. Genovese, Shirley Ho, Larry Wasserman

Neural Information Processing SystemsOct-2-2025, 00:13:13 GMT

Neural Information Processing Systems http://nips.cc/

artificial intelligence, coverage risk, machine learning, (17 more...)

Neural Information Processing Systems

Country: North America > United States (0.15)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

False Discovery Rate Control via Bayesian Mirror Statistic

Molinari, Marco, Thoresen, Magne

arXiv.org Machine LearningOct-2-2025

Simultaneously performing variable selection and inference in high-dimensional models is an open challenge in statistics and machine learning. The increasing availability of vast amounts of variables requires the adoption of specific statistical procedures to accurately select the most important predictors in a high-dimensional space, while being able to control some form of selection error. In this work we adapt the Mirror Statistic approach to False Discovery Rate (FDR) control into a Bayesian modelling framework. The Mirror Statistic, developed in the classic frequentist statistical framework, is a flexible method to control FDR, which only requires mild model assumptions, but requires two sets of independent regression coefficient estimates, usually obtained after splitting the original dataset. Here we propose to rely on a Bayesian formulation of the model and use the posterior distributions of the coefficients of interest to build the Mirror Statistic and effectively control the FDR without the need to split the data. Moreover, the method is very flexible since it can be used with continuous and discrete outcomes and more complex predictors, such as with mixed models. We keep the approach scalable to high-dimensions by relying on Automatic Differentiation Variational Inference and fully continuous prior choices.

coefficient, covariate, mirror statistic, (14 more...)

arXiv.org Machine Learning

2510.00875

Country:

Europe > Norway > Eastern Norway > Oslo (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report > Experimental Study (0.93)

Industry:

Health & Medicine > Pharmaceuticals & Biotechnology (0.46)
Education > Health & Safety > School Nutrition (0.46)
Health & Medicine > Consumer Health (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.94)

Add feedback

Chiseling: Powerful and Valid Subgroup Selection via Interactive Machine Learning

Cheng, Nathan, Spector, Asher, Janson, Lucas

arXiv.org Machine LearningSep-29-2025

In regression and causal inference, controlled subgroup selection aims to identify, with inferential guarantees, a subgroup (defined as a subset of the covariate space) on which the average response or treatment effect is above a given threshold. E.g., in a clinical trial, it may be of interest to find a subgroup with a positive average treatment effect. However, existing methods either lack inferential guarantees, heavily restrict the search for the subgroup, or sacrifice efficiency by naive data splitting. We propose a novel framework called chiseling that allows the analyst to interactively refine and test a candidate subgroup by iteratively shrinking it. The sole restriction is that the shrinkage direction only depends on the points outside the current subgroup, but otherwise the analyst may leverage any prior information or machine learning algorithm. Despite this flexibility, chiseling controls the probability that the discovered subgroup is null (e.g., has a non-positive average treatment effect) under minimal assumptions: for example, in randomized experiments, this inferential validity guarantee holds under only bounded moment conditions. When applied to a variety of simulated datasets and a real survey experiment, chiseling identifies substantially better subgroups than existing methods with inferential guarantees.

probability, sequence, subgroup, (16 more...)

arXiv.org Machine Learning

2509.1949

Country:

North America > United States > New York (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
Asia > India (0.04)

Genre:

Research Report > Strength High (1.00)
Research Report > Experimental Study (1.00)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (1.00)

Add feedback

What Is That Talk About? A Video-to-Text Summarization Dataset for Scientific Presentations

Liu, Dongqi, Whitehouse, Chenxi, Yu, Xi, Mahon, Louis, Saxena, Rohit, Zhao, Zheng, Qiu, Yifu, Lapata, Mirella, Demberg, Vera

arXiv.org Artificial IntelligenceFeb-26-2025

Transforming recorded videos into concise and accurate textual summaries is a growing challenge in multimodal learning. This paper introduces VISTA, a dataset specifically designed for video-to-text summarization in scientific domains. VISTA contains 18,599 recorded AI conference presentations paired with their corresponding paper abstracts. We benchmark the performance of state-of-the-art large models and apply a plan-based framework to better capture the structured nature of abstracts. Both human and automated evaluations confirm that explicit planning enhances summary quality and factual consistency. However, a considerable gap remains between models and human performance, highlighting the challenges of scientific video summarization.

computational linguistic, proceedings, summarization, (13 more...)

arXiv.org Artificial Intelligence

2502.08279

Country:

Asia > Thailand > Bangkok > Bangkok (0.04)
Asia > Singapore (0.04)
North America > United States > Florida > Miami-Dade County > Miami (0.04)
(15 more...)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (0.93)

Industry: Media (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.94)
(2 more...)

Add feedback

Export Reviews, Discussions, Author Feedback and Meta-Reviews

Neural Information Processing SystemsFeb-6-2025, 12:52:09 GMT

I have read the paper fast classification rates for high-dimensional conditional Gaussian models". The paper studies the problem of binary classification using a Gaussian model and provides some theoretical results on the convergence of the classification error rates (compared to the Bayes classifier). The paper presents some nice theoretical results and is interesting to some extent. I am generally positive about the paper but I have the following concerns. First, it is about the practical relevance.

author feedback and meta-review, classifier, convergence rate, (9 more...)

Neural Information Processing Systems

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (0.34)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback