selection consistency
- North America > United States > California > Santa Clara County > Palo Alto (0.04)
- Asia > Middle East > Jordan (0.04)
- Asia > China > Hong Kong (0.05)
- North America > Canada > Quebec > Montreal (0.04)
- Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.50)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.50)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.41)
A Nonparametric Statistics Approach to Feature Selection in Deep Neural Networks with Theoretical Guarantees
Du, Junye, Li, Zhenghao, Gu, Zhutong, Feng, Long
This paper tackles the problem of feature selection in a highly challenging setting: $\mathbb{E}(y | \boldsymbol{x}) = G(\boldsymbol{x}_{\mathcal{S}_0})$, where $\mathcal{S}_0$ is the set of relevant features and $G$ is an unknown, potentially nonlinear function subject to mild smoothness conditions. Our approach begins with feature selection in deep neural networks, then generalizes the results to H{ö}lder smooth functions by exploiting the strong approximation capabilities of neural networks. Unlike conventional optimization-based deep learning methods, we reformulate neural networks as index models and estimate $\mathcal{S}_0$ using the second-order Stein's formula. This gradient-descent-free strategy guarantees feature selection consistency with a sample size requirement of $n = Ω(p^2)$, where $p$ is the feature dimension. To handle high-dimensional scenarios, we further introduce a screening-and-selection mechanism that achieves nonlinear selection consistency when $n = Ω(s \log p)$, with $s$ representing the sparsity level. Additionally, we refit a neural network on the selected features for prediction and establish performance guarantees under a relaxed sparsity assumption. Extensive simulations and real-data analyses demonstrate the strong performance of our method even in the presence of complex feature interactions.
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Asia > China > Hong Kong (0.04)
- Asia > China > Hong Kong (0.05)
- North America > Canada > Quebec > Montreal (0.04)
- Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.70)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.70)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.61)
On the consistency theory of high dimensional variable screening
Xiangyu Wang, Chenlei Leng, David B. Dunson
V ariable screening is a fast dimension reduction technique for assisting high dimensional feature selection. As a preselection method, it selects a moderate size subset of candidate variables for further refining via feature selection to produce the final model. The performance of variable screening depends on both computational efficiency and the ability to dramatically reduce the number of variables without discarding the important ones. When the data dimension p is substantially larger than the sample size n, variable screening becomes crucial as 1) Faster feature selection algorithms are needed; 2) Conditions guaranteeing selection consistency might fail to hold. This article studies a class of linear screening methods and establishes consistency theory for this special class. In particular, we prove the restricted diagonally dominant (RDD) condition is a necessary and sufficient condition for strong screening consistency.
- North America > United States > California > Santa Clara County > Palo Alto (0.04)
- Asia > Middle East > Jordan (0.04)
- North America > United States > Illinois (0.04)
- North America > United States > District of Columbia > Washington (0.04)
- North America > Canada (0.04)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.68)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.46)
- North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
- North America > Canada > Nova Scotia > Halifax Regional Municipality > Halifax (0.04)
Sparse Deep Additive Model with Interactions: Enhancing Interpretability and Predictability
Hung, Yi-Ting, Lin, Li-Hsiang, Calhoun, Vince D.
Recent advances in deep learning highlight the need for personalized models that can learn from small or moderate samples, handle high dimensional features, and remain interpretable. To address this challenge, we propose the Sparse Deep Additive Model with Interactions (SDAMI), a framework that combines sparsity driven feature selection with deep subnetworks for flexible function approximation. Unlike conventional deep learning models, which often function as black boxes, SDAMI explicitly disentangles main effects and interaction effects to enhance interpretability. At the same time, its deep additive structure achieves higher predictive accuracy than classical additive models. Central to SDAMI is the concept of an Effect Footprint, which assumes that higher order interactions project marginally onto main effects. Guided by this principle, SDAMI adopts a two stage strategy: first, identify strong main effects that implicitly carry information about important interactions. second, exploit this information through structured regularization such as group lasso to distinguish genuine main effects from interaction effects. For each selected main effect, SDAMI constructs a dedicated subnetwork, enabling nonlinear function approximation while preserving interpretability and providing a structured foundation for modeling interactions. Extensive simulations with comparisons confirm SDAMI$'$s ability to recover effect structures across diverse scenarios, while applications in reliability analysis, neuroscience, and medical diagnostics further demonstrate its versatility in addressing real-world high-dimensional modeling challenges.
- North America > United States > Georgia > Fulton County > Atlanta (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Consistent causal discovery with equal error variances: a least-squares perspective
Chaudhuri, Anamitra, Ni, Yang, Bhattacharya, Anirban
We consider the problem of recovering the true causal structure among a set of variables, generated by a linear acyclic structural equation model (SEM) with the error terms being independent and having equal variances. It is well-known that the true underlying directed acyclic graph (DAG) encoding the causal structure is uniquely identifiable under this assumption. In this work, we establish that the sum of minimum expected squared errors for every variable, while predicted by the best linear combination of its parent variables, is minimised if and only if the causal structure is represented by any supergraph of the true DAG. This property is further utilised to design a Bayesian DAG selection method that recovers the true graph consistently.
- North America > United States > Texas > Brazos County > College Station (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)