lassonet
0626822954674a06ccd9c234e3f0d572-Supplemental-Conference.pdf
All models can be trained entirely on CPUs on consumer grade Laptop machines within minutes orhours. Execution times per epoch for the single-cell data with 529 features are as follows: Base=0.9, Centering the first frame: For golfing and waving, the root point of the first frame is movedtotheorigin(0,0,0). To map putative transcription factor (TF) and target gene relationships, we use as a reference a regulatory network generated using the gene expression and chromatin accessibility features 15 available inthehuman immune cells dataset. Ourruleforsuccessfully mapping aTFtoatargetgene through achromatin peak isthatall TF, chromatin peak, and target gene, have to be simultaneously in the list of features selected in therank_genes_groupsfunction for cell type of interest, and there haveto be TF motifs linked to that transcription factor in the chromatin peak.
Indirectly Parameterized Concrete Autoencoders
Nilsson, Alfred, Wijk, Klas, Gutha, Sai bharath chandra, Englesson, Erik, Hotti, Alexandra, Saccardi, Carlo, Kviman, Oskar, Lagergren, Jens, Vinuesa, Ricardo, Azizpour, Hossein
Feature selection is a crucial task in settings where data is high-dimensional or acquiring the full set of features is costly. Recent developments in neural network-based embedded feature selection show promising results across a wide range of applications. Concrete Autoencoders (CAEs), considered state-of-the-art in embedded feature selection, may struggle to achieve stable joint optimization, hurting their training time and generalization. In this work, we identify that this instability is correlated with the CAE learning duplicate selections. To remedy this, we propose a simple and effective improvement: Indirectly Parameterized CAEs (IP-CAEs). IP-CAEs learn an embedding and a mapping from it to the Gumbel-Softmax distributions' parameters. Despite being simple to implement, IP-CAE exhibits significant and consistent improvements over CAE in both generalization and training time across several datasets for reconstruction and classification. Unlike CAE, IP-CAE effectively leverages non-linear relationships and does not require retraining the jointly optimized decoder. Furthermore, our approach is, in principle, generalizable to Gumbel-Softmax distributions beyond feature selection.
Penalized Generative Variable Selection
Wang, Tong, Huang, Jian, Ma, Shuangge
Deep networks are increasingly applied to a wide variety of data, including data with high-dimensional predictors. In such analysis, variable selection can be needed along with estimation/model building. Many of the existing deep network studies that incorporate variable selection have been limited to methodological and numerical developments. In this study, we consider modeling/estimation using the conditional Wasserstein Generative Adversarial networks. Group Lasso penalization is applied for variable selection, which may improve model estimation/prediction, interpretability, stability, etc. Significantly advancing from the existing literature, the analysis of censored survival data is also considered. We establish the convergence rate for variable selection while considering the approximation error, and obtain a more efficient distribution estimation. Simulations and the analysis of real experimental data demonstrate satisfactory practical utility of the proposed analysis.
- North America > United States > Connecticut > New Haven County > New Haven (0.04)
- Asia > China > Hong Kong (0.04)
- Europe > United Kingdom > England > Greater London > London (0.04)
Variable selection for nonlinear Cox regression model via deep learning
Variable selection problem for the nonlinear Cox regression model is considered. In survival analysis, one main objective is to identify the covariates that are associated with the risk of experiencing the event of interest. The Cox proportional hazard model is being used extensively in survival analysis in studying the relationship between survival times and covariates, where the model assumes that the covariate has a log-linear effect on the hazard function. However, this linearity assumption may not be satisfied in practice. In order to extract a representative subset of features, various variable selection approaches have been proposed for survival data under the linear Cox model. However, there exists little literature on variable selection for the nonlinear Cox model. To break this gap, we extend the recently developed deep learning-based variable selection model LassoNet to survival data. Simulations are provided to demonstrate the validity and effectiveness of the proposed method. Finally, we apply the proposed methodology to analyze a real data set on diffuse large B-cell lymphoma.
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (1.00)
- Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
- Health & Medicine > Therapeutic Area > Oncology (0.49)
Sparsity in Continuous-Depth Neural Networks
Aliee, Hananeh, Richter, Till, Solonin, Mikhail, Ibarra, Ignacio, Theis, Fabian, Kilbertus, Niki
Neural Ordinary Differential Equations (NODEs) have proven successful in learning dynamical systems in terms of accurately recovering the observed trajectories. While different types of sparsity have been proposed to improve robustness, the generalization properties of NODEs for dynamical systems beyond the observed data are underexplored. We systematically study the influence of weight and feature sparsity on forecasting as well as on identifying the underlying dynamical laws. Besides assessing existing methods, we propose a regularization technique to sparsify "input-output connections" and extract relevant features during training. Moreover, we curate real-world datasets consisting of human motion capture and human hematopoiesis single-cell RNA-seq data to realistically analyze different levels of out-of-distribution (OOD) generalization in forecasting and dynamics identification respectively. Our extensive empirical evaluation on these challenging benchmarks suggests that weight sparsity improves generalization in the presence of noise or irregular sampling. However, it does not prevent learning spurious feature dependencies in the inferred dynamics, rendering them impractical for predictions under interventions, or for inferring the true underlying dynamics. Instead, feature sparsity can indeed help with recovering sparse ground-truth dynamics compared to unregularized NODEs.
- Europe > Germany > Bavaria > Upper Bavaria > Munich (0.05)
- North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
- North America > United States > Virginia > Arlington County > Arlington (0.04)
A neural network with feature sparsity
Lemhadri, Ismael, Ruan, Feng, Tibshirani, Robert
This technology has provided near-human performance on many prediction tasks Geirhos et al. (2017), and left deep marks on entire fields of business and science, to the extent that large computational and engineering efforts are routinely dedicated to neural network training and optimization Dean et al. (2012). However, neural networks are often criticized for their complexity and lack of interpretability. There are many arguments that favor simple models over more complex ones. In many applications (including healthcare Ahmad et al. (2018), Cabitza et al. (2017), insurance and finance Song et al. (2014), Thomas et al. (2002), flight control and other safety-critical tasks (Kurd et al. (2007)), interpretation of the underlying model is a critical requirement. On the other hand, traditional statistical tools, including simple linear models, remain popular because they are simple and explainable, with cheap, efficient computational tools being readily available.
- North America > United States > Iowa > Story County > Ames (0.04)
- South America > Uruguay > Maldonado > Maldonado (0.04)
- Asia > China (0.04)