data splitting
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- North America > Canada (0.04)
- Europe > Germany > Baden-Württemberg > Tübingen Region > Tübingen (0.04)
- Asia > Japan > Honshū > Kantō > Kanagawa Prefecture (0.04)
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- North America > Canada (0.04)
- Europe > Germany > Baden-Württemberg > Tübingen Region > Tübingen (0.04)
- Asia > Japan > Honshū > Kantō > Kanagawa Prefecture (0.04)
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- North America > Canada (0.04)
- Europe > Germany > Baden-Württemberg > Tübingen Region > Tübingen (0.04)
- Asia > Japan > Honshū > Kantō > Kanagawa Prefecture (0.04)
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- North America > Canada (0.04)
- Europe > Germany > Baden-Württemberg > Tübingen Region > Tübingen (0.04)
- Asia > Japan > Honshū > Kantō > Kanagawa Prefecture (0.04)
- North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.05)
- North America > United States > New York (0.04)
False Discovery Rate Control via Bayesian Mirror Statistic
Molinari, Marco, Thoresen, Magne
Simultaneously performing variable selection and inference in high-dimensional models is an open challenge in statistics and machine learning. The increasing availability of vast amounts of variables requires the adoption of specific statistical procedures to accurately select the most important predictors in a high-dimensional space, while being able to control some form of selection error. In this work we adapt the Mirror Statistic approach to False Discovery Rate (FDR) control into a Bayesian modelling framework. The Mirror Statistic, developed in the classic frequentist statistical framework, is a flexible method to control FDR, which only requires mild model assumptions, but requires two sets of independent regression coefficient estimates, usually obtained after splitting the original dataset. Here we propose to rely on a Bayesian formulation of the model and use the posterior distributions of the coefficients of interest to build the Mirror Statistic and effectively control the FDR without the need to split the data. Moreover, the method is very flexible since it can be used with continuous and discrete outcomes and more complex predictors, such as with mixed models. We keep the approach scalable to high-dimensions by relying on Automatic Differentiation Variational Inference and fully continuous prior choices.
- Europe > Norway > Eastern Norway > Oslo (0.04)
- Asia > Middle East > Jordan (0.04)
- Health & Medicine > Consumer Health (0.67)
- Education > Health & Safety > School Nutrition (0.67)
- Health & Medicine > Pharmaceuticals & Biotechnology (0.46)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.94)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.94)
Chiseling: Powerful and Valid Subgroup Selection via Interactive Machine Learning
Cheng, Nathan, Spector, Asher, Janson, Lucas
In regression and causal inference, controlled subgroup selection aims to identify, with inferential guarantees, a subgroup (defined as a subset of the covariate space) on which the average response or treatment effect is above a given threshold. E.g., in a clinical trial, it may be of interest to find a subgroup with a positive average treatment effect. However, existing methods either lack inferential guarantees, heavily restrict the search for the subgroup, or sacrifice efficiency by naive data splitting. We propose a novel framework called chiseling that allows the analyst to interactively refine and test a candidate subgroup by iteratively shrinking it. The sole restriction is that the shrinkage direction only depends on the points outside the current subgroup, but otherwise the analyst may leverage any prior information or machine learning algorithm. Despite this flexibility, chiseling controls the probability that the discovered subgroup is null (e.g., has a non-positive average treatment effect) under minimal assumptions: for example, in randomized experiments, this inferential validity guarantee holds under only bounded moment conditions. When applied to a variety of simulated datasets and a real survey experiment, chiseling identifies substantially better subgroups than existing methods with inferential guarantees.
- North America > United States > New York (0.04)
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- Asia > India (0.04)
- Research Report > Strength High (1.00)
- Research Report > Experimental Study (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.92)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.67)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.67)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Mathematical & Statistical Methods (0.67)
What Is That Talk About? A Video-to-Text Summarization Dataset for Scientific Presentations
Liu, Dongqi, Whitehouse, Chenxi, Yu, Xi, Mahon, Louis, Saxena, Rohit, Zhao, Zheng, Qiu, Yifu, Lapata, Mirella, Demberg, Vera
Transforming recorded videos into concise and accurate textual summaries is a growing challenge in multimodal learning. This paper introduces VISTA, a dataset specifically designed for video-to-text summarization in scientific domains. VISTA contains 18,599 recorded AI conference presentations paired with their corresponding paper abstracts. We benchmark the performance of state-of-the-art large models and apply a plan-based framework to better capture the structured nature of abstracts. Both human and automated evaluations confirm that explicit planning enhances summary quality and factual consistency. However, a considerable gap remains between models and human performance, highlighting the challenges of scientific video summarization.
- Asia > Thailand (0.14)
- North America > Canada (0.14)
- Europe > Middle East > Malta (0.14)
- (8 more...)
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (0.93)
Export Reviews, Discussions, Author Feedback and Meta-Reviews
I have read the paper fast classification rates for high-dimensional conditional Gaussian models". The paper studies the problem of binary classification using a Gaussian model and provides some theoretical results on the convergence of the classification error rates (compared to the Bayes classifier). The paper presents some nice theoretical results and is interesting to some extent. I am generally positive about the paper but I have the following concerns. First, it is about the practical relevance.
Review for NeurIPS paper: Learning Kernel Tests Without Data Splitting
Additional Feedback: Minor comments ******************* - l.55 the observed test statistic \hat{\tau} is never really defined, in particular w.r.t. This might be a bit confusing - I think Lemma 1 needs rephrasing. In particular, by definition the expectation of \tau is already that of h, so one does not need to assume anything here. And the variance of \tau is \sigma 2 / n. Since \tau is already defined in the previous paragraph, writing "Let \mu denote E[h] and \sigma 2 Var(h)" seems enough - missing \mid H_A in eq 1? - l. 137 the e_j vectors are not defined, although it remains understandable.