Goto

Collaborating Authors

 regression variable


What is in the model? A Comparison of variable selection criteria and model search approaches

Xu, Shuangshuang, Ferreira, Marco A. R., Tegge, Allison N.

arXiv.org Machine Learning

What is in the model? Abstract For many scientific questions, understanding the underlying mechanism is the goal. To help investigators better understand the underlying mechanism, variable selection is a crucial step that permits the identification of the most associated regression variables of interest. A variable selection method consists of model evaluation using an information criterion and a search of the model space. Here, we provide a comprehensive comparison of variable selection methods using performance measures of correct identification rate (CIR), recall, and false discovery rate (FDR). We consider the BIC and AIC for evaluating models, and exhaustive, greedy, LASSO path, and stochastic search approaches for searching the model space; we also consider LASSO using cross validation. We perform simulation studies for linear and generalized linear models that parametrically explore a wide range of realistic sample sizes, effect sizes, and correlations among regression variables. We consider model spaces with a small and larger number of potential regressors. The results show that the exhaustive search BIC and stochastic search BIC outperform the other methods when considering the performance measures on small and large model spaces, respectively. These approaches result in the highest CIR and lowest FDR, which collectively may support long-term efforts towards increasing replicability in research.


How To Predict Multiple Variables With One Model? And Why!

#artificialintelligence

When we start working with TensorFlow, we usually use the sequential format to create Models with the Keras library. With sequential models, we can solve many problems in all fields of deep learning. Whether they are image recognition or classification, Natural Language Processing, or Series Forecasting… they are models powerful enough to be used in a large majority of problems. But there are times when we need to go a little further in using Keras with TensorFlow. So, we can use the API for model creation, which opens up a wide world with many more possibilities that we did not have when using sequential models.



FREEtree: A Tree-based Approach for High Dimensional Longitudinal Data With Correlated Features

Xu, Yuancheng, Zafirov, Athanasse, Alvarez, R. Michael, Kojis, Dan, Tan, Min, Ramirez, Christina M.

arXiv.org Machine Learning

This paper proposes FREEtree, a tree-based method for high dimensional longitudinal data with correlated features. Popular machine learning approaches, like Random Forests, commonly used for variable selection do not perform well when there are correlated features and do not account for data observed over time. FREEtree deals with longitudinal data by using a piecewise random effects model. It also exploits the network structure of the features by first clustering them using weighted correlation network analysis, namely WGCNA. It then conducts a screening step within each cluster of features and a selection step among the surviving features, that provides a relatively unbiased way to select features. By using dominant principle components as regression variables at each leaf and the original features as splitting variables at splitting nodes, FREEtree maintains its interpretability and improves its computational efficiency. The simulation results show that FREEtree outperforms other tree-based methods in terms of prediction accuracy, feature selection accuracy, as well as the ability to recover the underlying structure.