AITopics

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)
North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Neural Information Processing SystemsDec-24-2025, 23:58:43 GMT

Power analysis of knockoff filters for correlated designs

The knockoff filter introduced by Barber and Cand\`es 2016 is an elegant framework for controlling the false discovery rate in variable selection. While empirical results indicate that this methodology is not too conservative, there is no conclusive theoretical result on its power. When the predictors are i.i.d.\ Gaussian, it is known that as the signal to noise ratio tend to infinity, the knockoff filter is consistent in the sense that one can make FDR go to 0 and power go to 1 simultaneously. In this work we study the case where the predictors have a general covariance matrix $\bsigma$. We introduce a simple functional called \emph{effective signal deficiency (ESD)} of the covariance matrix of the predictors that predicts consistency of various variable selection methods. In particular, ESD reveals that the structure of the precision matrix plays a central role in consistency and therefore, so does the conditional independence structure of the predictors. To leverage this connection, we introduce \emph{Conditional Independence knockoff}, a simple procedure that is able to compete with the more sophisticated knockoff filters and that is defined when the predictors obey a Gaussian tree graphical models (or when the graph is sufficiently sparse). Our theoretical results are supported by numerical evidence on synthetic data.

knockoff filter, power analysis, predictor, (6 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.40)

Neural Information Processing SystemsNov-20-2025, 15:17:44 GMT

DeepPINK: reproducible feature selection in deep neural networks Y ang Young Lu

Deep learning has become increasingly popular in both supervised and unsupervised machine learning thanks to its outstanding empirical performance.

artificial intelligence, deeppink, machine learning, (17 more...)

Country:

North America > United States > California > Los Angeles County > Los Angeles (0.28)
North America > United States > Washington > King County > Seattle (0.04)
North America > Canada > Quebec > Montreal (0.04)

Genre: Research Report > Experimental Study (0.94)

Industry:

Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (1.00)
Health & Medicine > Therapeutic Area > Immunology (1.00)
Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
(3 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Jingbo Liu, Philippe Rigollet

Power analysis of knockoff filters for correlated designs

Neural Information Processing SystemsOct-2-2025, 00:57:25 GMT

When the predictors are i.i.d.

artificial intelligence, knockoff filter, machine learning, (14 more...)

Country: North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Naylor, Peter, Poignard, Benjamin, Climente-González, Héctor, Yamada, Makoto

Sparse minimum Redundancy Maximum Relevance for feature selection

arXiv.org Machine LearningAug-27-2025

We propose a feature screening method that integrates both feature-feature and feature-target relationships. Inactive features are identified via a penalized minimum Redundancy Maximum Relevance (mRMR) procedure, which is the continuous version of the classic mRMR penalized by a non-convex regularizer, and where the parameters estimated as zero coefficients represent the set of inactive features. We establish the conditions under which zero coefficients are correctly identified to guarantee accurate recovery of inactive features. We introduce a multi-stage procedure based on the knockoff filter enabling the penalized mRMR to discard inactive features while controlling the false discovery rate (FDR). Our method performs comparably to HSIC-LASSO but is more conservative in the number of selected features. It only requires setting an FDR threshold, rather than specifying the number of features to retain. The effectiveness of the method is illustrated through simulations and real-world datasets. The code to reproduce this work is available on the following GitHub: https://github.com/PeterJackNaylor/SmRMR.

artificial intelligence, machine learning, procedure, (20 more...)

2508.18901

Country:

Asia > Japan > Kyūshū & Okinawa > Okinawa (0.04)
North America > United States > New York (0.04)
North America > United States > California > Santa Clara County > Stanford (0.04)
(3 more...)

Genre: Research Report (1.00)

Industry: Health & Medicine (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.93)

Neural Information Processing SystemsOct-9-2024, 11:55:15 GMT

Power analysis of knockoff filters for correlated designs

The knockoff filter introduced by Barber and Cand\ es 2016 is an elegant framework for controlling the false discovery rate in variable selection. While empirical results indicate that this methodology is not too conservative, there is no conclusive theoretical result on its power. When the predictors are i.i.d.\ Gaussian, it is known that as the signal to noise ratio tend to infinity, the knockoff filter is consistent in the sense that one can make FDR go to 0 and power go to 1 simultaneously. In this work we study the case where the predictors have a general covariance matrix \bsigma . We introduce a simple functional called \emph{effective signal deficiency (ESD)} of the covariance matrix of the predictors that predicts consistency of various variable selection methods.

artificial intelligence, knockoff filter, machine learning, (6 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.42)

Neural Information Processing SystemsOct-7-2024, 07:13:11 GMT

Reviews: DeepPINK: reproducible feature selection in deep neural networks

The paper proposes a method for feature selection in neural networks using a method for a controlled error rate, quantified through the False Discovery Rate. To control FDR the paper is using the model-X knockoffs framework [2, 3, 10]: construct random features that obey distributional properties with respect to the true features, and extract statistics (filters) of pairwise importance measures between true-knockoff dimensions. The choice of the importance function and the knockoff filters is flexible. The novelty of this paper lies in using a neural network (MLP) to get the importance measure through a linear layer that couples the true and knockoff features pairwise. The final statistic depends both on the trainable linear layer weights and the rest of the network weights.

feature selection, neural network, selection, (17 more...)

Genre: Summary/Review (0.48)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.40)

arXiv.org Machine LearningMay-5-2024

Stability of a Generalized Debiased Lasso with Applications to Resampling-Based Variable Selection

Liu, Jingbo

Suppose that we first apply the Lasso to a design matrix, and then update one of its columns. In general, the signs of the Lasso coefficients may change, and there is no closed-form expression for updating the Lasso solution exactly. In this work, we propose an approximate formula for updating a debiased Lasso coefficient. We provide general nonasymptotic error bounds in terms of the norms and correlations of a given design matrix's columns, and then prove asymptotic convergence results for the case of a random design matrix with i.i.d.\ sub-Gaussian row vectors and i.i.d.\ Gaussian noise. Notably, the approximate formula is asymptotically correct for most coordinates in the proportional growth regime, under the mild assumption that each row of the design matrix is sub-Gaussian with a covariance matrix having a bounded condition number. Our proof only requires certain concentration and anti-concentration properties to control various error terms and the number of sign changes. In contrast, rigorously establishing distributional limit properties (e.g.\ Gaussian limits for the debiased Lasso) under similarly general assumptions has been considered open problem in the universality theory. As applications, we show that the approximate formula allows us to reduce the computation complexity of variable selection algorithms that require solving multiple Lasso problems, such as the conditional randomization test and a variant of the knockoff filter.

knockoff filter, matrix, statistics, (17 more...)

2405.03063

Country:

North America > United States > Illinois (0.04)
Asia > Middle East > Jordan (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.92)

Pournaderi, Mehrdad, Xiang, Yu

Differentially Private Variable Selection via the Knockoff Filter

arXiv.org Machine LearningSep-14-2021

The knockoff filter, recently developed by Barber and Candes, is an effective procedure to perform variable selection with a controlled false discovery rate (FDR). We propose a private version of the knockoff filter by incorporating Gaussian and Laplace mechanisms, and show that variable selection with controlled FDR can be achieved. Simulations demonstrate that our setting has reasonable statistical power.

denote, fdr control, statistics, (15 more...)

2109.05402

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
North America > United States > Utah > Salt Lake County > Salt Lake City (0.04)
North America > United States > New York (0.04)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)

Chia, Charmaine, Sesia, Matteo, Ho, Chi-Sing, Jeffrey, Stefanie S., Dionne, Jennifer, Candès, Emmanuel J., Howe, Roger T.

Interpretable Signal Analysis with Knockoffs Enhances Classification of Bacterial Raman Spectra

arXiv.org Machine LearningSep-3-2020

EW sensor technologies have contributed to the advent of "big data" in biomedicine, of which signal data are for example, saliency methods help visualize the activation of an important modality. From one-dimensional electrocardiography individual input features [5], while attribution methods like and electroencephalography signals from the heart and LIME [6] and SHAP [7] quantify the impact of each feature brain, to two-dimensional tissue images of tumor histology, to on the output predictions. However, these post hoc techniques three-dimensional magnetic resonance images, these consist are inadequate for developing simpler models. of sequential measures of an observable along one or more With regard to relevancy, studies report that people favor independent axes such as time, distance, or frequency. Signal explanations that are short, contrast instances with different data differ from structured forms of data in that the meaning outcomes, and highlight abnormal causes [8]. In other words, of each independent variable is not as distinctively and intuitively we seek to understand which features are important, and definable. Informative features must be extracted from how these affect the outcome. Data scientists often pursue these raw data using signal processing and machine learning these goals through feature selection, in addition to feature (ML) techniques before useful patterns can be detected and extraction, to ensure that their conclusions are based on leveraged to make predictions.

artificial intelligence, data mining, machine learning, (16 more...)

2006.04937

Country:

North America > United States > New York (0.04)
North America > United States > California > Santa Clara County > Stanford (0.04)

Genre: Research Report > Experimental Study (0.35)

Industry:

Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (1.00)
Health & Medicine > Pharmaceuticals & Biotechnology (0.94)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.95)