kernel feature selection
Kernel Feature Selection via Conditional Covariance Minimization
We propose a method for feature selection that employs kernel-based measures of independence to find a subset of covariates that is maximally predictive of the response. Building on past work in kernel dimension reduction, we show how to perform feature selection via a constrained optimization problem involving the trace of the conditional covariance operator. We prove various consistency results for this procedure, and also demonstrate that our method compares favorably with other state-of-the-art algorithms on a variety of synthetic and real data sets.
Reviews: Kernel Feature Selection via Conditional Covariance Minimization
In this paper, authors propose a new nonlinear feature selection based on kernels. More specifically, the conditional covariance operator has been employed to measure the conditional independence between Y and X given the subset of X. Then, the feature selection can be done by searching a set of features that minimizing the conditional independence. This optimization problem results in minimizing over matrix inverse and it is hard to optimize it. Thus, a novel approach to deal with the matrix inverse problem is also proposed.
Kernel Feature Selection via Conditional Covariance Minimization
Jianbo Chen, Mitchell Stern, Martin J. Wainwright, Michael I. Jordan
We propose a method for feature selection that employs kernel-based measures of independence to find a subset of covariates that is maximally predictive of the response. Building on past work in kernel dimension reduction, we show how to perform feature selection via a constrained optimization problem involving the trace of the conditional covariance operator. We prove various consistency results for this procedure, and also demonstrate that our method compares favorably with other state-of-the-art algorithms on a variety of synthetic and real data sets.
Kernel Feature Selection via Conditional Covariance Minimization
Chen, Jianbo, Stern, Mitchell, Wainwright, Martin J., Jordan, Michael I.
We propose a method for feature selection that employs kernel-based measures of independence to find a subset of covariates that is maximally predictive of the response. Building on past work in kernel dimension reduction, we show how to perform feature selection via a constrained optimization problem involving the trace of the conditional covariance operator. We prove various consistency results for this procedure, and also demonstrate that our method compares favorably with other state-of-the-art algorithms on a variety of synthetic and real data sets. Papers published at the Neural Information Processing Systems Conference.
Kernel Feature Selection via Conditional Covariance Minimization
Chen, Jianbo, Stern, Mitchell, Wainwright, Martin J., Jordan, Michael I.
We propose a method for feature selection that employs kernel-based measures of independence to find a subset of covariates that is maximally predictive of the response. Building on past work in kernel dimension reduction, we show how to perform feature selection via a constrained optimization problem involving the trace of the conditional covariance operator. We prove various consistency results for this procedure, and also demonstrate that our method compares favorably with other state-of-the-art algorithms on a variety of synthetic and real data sets.
Kernel Feature Selection via Conditional Covariance Minimization
Chen, Jianbo, Stern, Mitchell, Wainwright, Martin J., Jordan, Michael I.
Feature selection is an important problem in statistical machine learning, and is a common method for dimensionality reduction that encourages model interpretability. With large data sets becoming ever more prevalent, feature selection has seen widespread usage across a variety of real-world tasks in recent years, including text classification, gene selection from microarray data, and face recognition [3, 13, 17]. In this work, we consider the supervised variant of feature selection, which entails finding a subset of the input features that explains the output well. This practice can reduce the computational expense of downstream learning by removing features that are redundant or noisy, while simultaneously providing insight into the data through the features that remain. Feature selection algorithms can generally be divided into three main categories: filter methods, wrapper methods, and embedded methods [13].