Predicting flu deaths with R
As Google learned, predicting the spread of influenza, even with mountains of data, is notoriously difficult. Nonetheless, bioinformatician and R user Shirin Glander has created a two-part tutorial about predicting flu deaths with R (part 2 here). The analysis is based on just 136 cases of influenza A H7N9 in China in 2013 (data provided in the outbreaks package) so the intent was not to create a generally predictive model, but by providing all of the R code and graphics Shirin has created a useful example of real-word predictive modeling with R. The tutorial covers loading and cleaning the data (including a nice example of using the mice package to impute missing values) and begins with some exploratory data visualizations. I was particularly impressed by the use of density charts (using the stat_density2d ggplot2 aesthetic) to highlight differences in the scatterplots of flu cases ending in death and recovery. Decision trees (implemented using rpart and visualized using fancyRpartPlot from the rattle package) Random Forests (using caret's "rf" training method) Elastic-Net Regularized Generalized Linear Models (using caret's "glmnet" training method) K-nearest neighbors clustering (using caret's "kknn" training method) Penalized Discriminant Analysis (using caret's "pda" training method) and in Part 2, Extreme gradient boosting using the xgboost package and various preprocessing techniques from the caret package Due to the limited data size, there's not too much difference between the models: in each case, 13-15 of the 23 cases were classified correctly.
Dec-18-2016, 06:10:26 GMT