Goto

Collaborating Authors

 selection consistency




A Nonparametric Statistics Approach to Feature Selection in Deep Neural Networks with Theoretical Guarantees

Du, Junye, Li, Zhenghao, Gu, Zhutong, Feng, Long

arXiv.org Machine Learning

This paper tackles the problem of feature selection in a highly challenging setting: $\mathbb{E}(y | \boldsymbol{x}) = G(\boldsymbol{x}_{\mathcal{S}_0})$, where $\mathcal{S}_0$ is the set of relevant features and $G$ is an unknown, potentially nonlinear function subject to mild smoothness conditions. Our approach begins with feature selection in deep neural networks, then generalizes the results to H{ö}lder smooth functions by exploiting the strong approximation capabilities of neural networks. Unlike conventional optimization-based deep learning methods, we reformulate neural networks as index models and estimate $\mathcal{S}_0$ using the second-order Stein's formula. This gradient-descent-free strategy guarantees feature selection consistency with a sample size requirement of $n = Ω(p^2)$, where $p$ is the feature dimension. To handle high-dimensional scenarios, we further introduce a screening-and-selection mechanism that achieves nonlinear selection consistency when $n = Ω(s \log p)$, with $s$ representing the sparsity level. Additionally, we refit a neural network on the selected features for prediction and establish performance guarantees under a relaxed sparsity assumption. Extensive simulations and real-data analyses demonstrate the strong performance of our method even in the presence of complex feature interactions.


Bayesian Model Selection Approach to Boundary Detection with Non-Local Priors

Fei Jiang, Guosheng Yin, Francesca Dominici

Neural Information Processing Systems

Based on non-local prior distributions, we propose a Bayesian model selection (BMS) procedure for boundary detection in a sequence of data with multiple systematic mean changes. The BMS method can effectively suppress the non-boundary spike points with large instantaneous changes.


On the consistency theory of high dimensional variable screening

Xiangyu Wang, Chenlei Leng, David B. Dunson

Neural Information Processing Systems

V ariable screening is a fast dimension reduction technique for assisting high dimensional feature selection. As a preselection method, it selects a moderate size subset of candidate variables for further refining via feature selection to produce the final model. The performance of variable screening depends on both computational efficiency and the ability to dramatically reduce the number of variables without discarding the important ones. When the data dimension p is substantially larger than the sample size n, variable screening becomes crucial as 1) Faster feature selection algorithms are needed; 2) Conditions guaranteeing selection consistency might fail to hold. This article studies a class of linear screening methods and establishes consistency theory for this special class. In particular, we prove the restricted diagonally dominant (RDD) condition is a necessary and sufficient condition for strong screening consistency.





Sparse Deep Additive Model with Interactions: Enhancing Interpretability and Predictability

Hung, Yi-Ting, Lin, Li-Hsiang, Calhoun, Vince D.

arXiv.org Machine Learning

Recent advances in deep learning highlight the need for personalized models that can learn from small or moderate samples, handle high dimensional features, and remain interpretable. To address this challenge, we propose the Sparse Deep Additive Model with Interactions (SDAMI), a framework that combines sparsity driven feature selection with deep subnetworks for flexible function approximation. Unlike conventional deep learning models, which often function as black boxes, SDAMI explicitly disentangles main effects and interaction effects to enhance interpretability. At the same time, its deep additive structure achieves higher predictive accuracy than classical additive models. Central to SDAMI is the concept of an Effect Footprint, which assumes that higher order interactions project marginally onto main effects. Guided by this principle, SDAMI adopts a two stage strategy: first, identify strong main effects that implicitly carry information about important interactions. second, exploit this information through structured regularization such as group lasso to distinguish genuine main effects from interaction effects. For each selected main effect, SDAMI constructs a dedicated subnetwork, enabling nonlinear function approximation while preserving interpretability and providing a structured foundation for modeling interactions. Extensive simulations with comparisons confirm SDAMI$'$s ability to recover effect structures across diverse scenarios, while applications in reliability analysis, neuroscience, and medical diagnostics further demonstrate its versatility in addressing real-world high-dimensional modeling challenges.


Consistent causal discovery with equal error variances: a least-squares perspective

Chaudhuri, Anamitra, Ni, Yang, Bhattacharya, Anirban

arXiv.org Machine Learning

We consider the problem of recovering the true causal structure among a set of variables, generated by a linear acyclic structural equation model (SEM) with the error terms being independent and having equal variances. It is well-known that the true underlying directed acyclic graph (DAG) encoding the causal structure is uniquely identifiable under this assumption. In this work, we establish that the sum of minimum expected squared errors for every variable, while predicted by the best linear combination of its parent variables, is minimised if and only if the causal structure is represented by any supergraph of the true DAG. This property is further utilised to design a Bayesian DAG selection method that recovers the true graph consistently.