Goto

Collaborating Authors

 group lasso


Consistent feature selection for analytic deep neural networks

Neural Information Processing Systems

One of the most important steps toward interpretability and explainability of neural network models is feature selection, which aims to identify the subset of relevant features. Theoretical results in the field have mostly focused on the prediction aspect of the problem with virtually no work on feature selection consistency for deep neural networks due to the model's severe nonlinearity and unidentifiability. This lack of theoretical foundation casts doubt on the applicability of deep learning to contexts where correct interpretations of the features play a central role. In this work, we investigate the problem of feature selection for analytic deep networks. We prove that for a wide class of networks, including deep feed-forward neural networks, convolutional neural networks and a major sub-class of residual neural networks, the Adaptive Group Lasso selection procedure with Group Lasso as the base estimator is selection-consistent. The work provides further evidence that Group Lasso might be inefficient for feature selection with neural networks and advocates the use of Adaptive Group Lasso over the popular Group Lasso.


Smooth Bilevel Programming for Sparse Regularization

Neural Information Processing Systems

Iteratively reweighted least square (IRLS) is a popular approach to solve sparsity-enforcing regression problems in machine learning. State of the art approaches are more efficient but typically rely on specific coordinate pruning schemes. In this work, we show how a surprisingly simple re-parametrization of IRLS, coupled with a bilevel resolution (instead of an alternating scheme) is able to achieve top performances on a wide range of sparsity (such as Lasso, group Lasso and trace norm regularizations), regularization strength (including hard constraints), and design matrices (ranging from correlated designs to differential operators). Similarly to IRLS, our method only involves linear systems resolutions, but in sharp contrast, corresponds to the minimization of a smooth function. Despite being non-convex, we show that there is no spurious minima and that saddle points are ridable'', so that there always exists a descent direction. We thus advocate for the use of a BFGS quasi-Newton solver, which makes our approach simple, robust and efficient. We perform a numerical benchmark of the convergence speed of our algorithm against state of the art solvers for Lasso, group Lasso, trace norm and linearly constrained problems. These results highlight the versatility of our approach, removing the need to use different solvers depending on the specificity of the ML problem under study.



Selective inference for group-sparse linear models

Fan Yang, Rina Foygel Barber, Prateek Jain, John Lafferty

Neural Information Processing Systems

The fundamental challenge is that after the data have been used to select a set of coefficients to be studied, this selection event must then be accounted for when performing inference, using the same data.



Distributed Machine Learning with Sparse Heterogeneous Data

Neural Information Processing Systems

This increase in data sources has led to applications that are increasingly high-dimensional. To be both statistically and computationally efficient in this setting, it is then important to develop approaches that can exploit the structure within the data.


Efficient Group Lasso Regularized Rank Regression with Data-Driven Parameter Determination

Lin, Meixia, Shi, Meijiao, Xiao, Yunhai, Zhang, Qian

arXiv.org Machine Learning

High-dimensional regression often suffers from heavy-tailed noise and outliers, which can severely undermine the reliability of least-squares based methods. To improve robustness, we adopt a non-smooth Wilcoxon score based rank objective and incorporate structured group sparsity regularization, a natural generalization of the lasso, yielding a group lasso regularized rank regression method. By extending the tuning-free parameter selection scheme originally developed for the lasso, we introduce a data-driven, simulation-based tuning rule and further establish a finite-sample error bound for the resulting estimator. On the computational side, we develop a proximal augmented Lagrangian method for solving the associated optimization problem, which eliminates the singularity issues encountered in existing methods, thereby enabling efficient semismooth Newton updates for the subproblems. Extensive numerical experiments demonstrate the robustness and effectiveness of our proposed estimator against alternatives, and showcase the scalability of the algorithm across both simulated and real-data settings.


Structure-Aware Compound-Protein Affinity Prediction via Graph Neural Network with Group Lasso Regularization

Shi, Zanyu, Wang, Yang, Weerawarna, Pathum, Zhang, Jie, Richardson, Timothy, Wang, Yijie, Huang, Kun

arXiv.org Artificial Intelligence

Explainable artificial intelligence (XAI) approaches have been increasingly applied in drug discovery to learn molecular representations and identify substructures driving property predictions. However, building end-to-end explainable models for structure-activity relationship (SAR) modeling for compound property prediction faces many challenges, such as the limited number of compound-protein interaction activity data for specific protein targets, and plenty of subtle changes in molecular configuration sites significantly affecting molecular properties. We exploit pairs of molecules with activity cliffs that share scaffolds but differ at substituent sites, characterized by large potency differences for specific protein targets. We propose a framework by implementing graph neural networks (GNNs) to leverage property and structure information from activity cliff pairs to predict compound-protein affinity (i.e., half maximal inhibitory concentration, IC50). To enhance model performance and explainability, we train GNNs with structure-aware loss functions using group lasso and sparse group lasso regularizations, which prune and highlight molecular subgraphs relevant to activity differences. We applied this framework to activity cliff data of molecules targeting three proto-oncogene tyrosine-protein kinase Src proteins (PDB IDs: 1O42, 2H8H, 4MXO). Our approach improved property prediction by integrating common and uncommon node information with sparse group lasso, as reflected in reduced root mean squared error (RMSE) and improved Pearson's correlation coefficient (PCC). Applying regularizations also enhances feature attribution for GNN by boosting graph-level global direction scores and improving atom-level coloring accuracy. These advances strengthen model interpretability in drug discovery pipelines, particularly for identifying critical molecular substructures in lead optimization.


Exclusive Feature Learning on Arbitrary Structures via $\ell_{1,2}$-norm

Deguang Kong, Ryohei Fujimaki, Ji Liu, Feiping Nie, Chris Ding

Neural Information Processing Systems

Group LASSO is widely used to enforce the structural sparsity, which achieves the sparsity at the inter-group level. In this paper, we propose a new formulation called "exclusive group LASSO", which brings out sparsity at intra-group level in the context of feature selection. The proposed exclusive group LASSO is applicable on any feature structures, regardless of their overlapping or non-overlapping structures. We provide analysis on the properties of exclusive group LASSO, and propose an effective iteratively re-weighted algorithm to solve the corresponding optimization problem with rigorous convergence analysis. We show applications of exclusive group LASSO for uncorrelated feature selection. Extensive experiments on both synthetic and real-world datasets validate the proposed method.