Regression is arguably the workhorse of statistics. Despite its popularity, however, it may also be the most misunderstood. The Dependent Variable is something you want to predict or explain. In a Marketing Research context it might be Purchase Interest measured on a 0-10 rating scale. The Independent Variable is what you use to explain or predict the Dependent Variable.
In this paper we adopted state-of-the-art machine learning algorithms, namely: random forest (RF) and least squares boosting, to model crash data and identify the optimum model to study the impact of narrow lanes on the safety of arterial roads. Using a ten-year crash dataset in four cities in Nebraska, two machine learning models were assessed based on the prediction error. The RF model was identified as the best model. The RF was used to compute the importance of the lane width predictors in our regression model based on two different measures. Subsequently, the RF model was used to simulate the crash rate for different lane widths. The Kruskal-Wallis test, was then conducted to determine if simulated values from the four lane width groups have equal means. The test null hypothesis of equal means for simulated values from the four lane width groups was rejected. Consequently, it was concluded that the crash rates from at least one lane width group was statistically different from the others. Finally, the results from the pairwise comparisons using the Tukey and Kramer test showed that the changes in crash rates between any two lane width conditions were statistically significant.
This contribution is from David Corliss. David teaches a class on this subject, giving a (very brief) description of 23 regression methods in just an hour, with an example and the package and procedures used for each case. Here you can check the webcast done for Central Michigan University. For instance, I would add piecewise linear regression, as well as regression on unusual domains (on a sphere or on the simplex.)