In this paper we adopted state-of-the-art machine learning algorithms, namely: random forest (RF) and least squares boosting, to model crash data and identify the optimum model to study the impact of narrow lanes on the safety of arterial roads. Using a ten-year crash dataset in four cities in Nebraska, two machine learning models were assessed based on the prediction error. The RF model was identified as the best model. The RF was used to compute the importance of the lane width predictors in our regression model based on two different measures. Subsequently, the RF model was used to simulate the crash rate for different lane widths. The Kruskal-Wallis test, was then conducted to determine if simulated values from the four lane width groups have equal means. The test null hypothesis of equal means for simulated values from the four lane width groups was rejected. Consequently, it was concluded that the crash rates from at least one lane width group was statistically different from the others. Finally, the results from the pairwise comparisons using the Tukey and Kramer test showed that the changes in crash rates between any two lane width conditions were statistically significant.
Abstract: Predicting traffic incident duration is a major challenge for many traffic centres around the world. Most research studies focus on predicting the incident duration on motorways rather than arterial roads, due to a high network complexity and lack of data. In this paper we propose a bi-level framework for predicting the accident duration on arterial road networks in Sydney, based on operational requirements of incident clearance target which is less than 45 minutes. Using incident baseline information, we first deploy a classification method using various ensemble tree models in order to predict whether a new incident will be cleared in less than 45min or not. If the incident was classified as short-term, then various regression models are developed for predicting the actual incident duration in minutes by incorporating various traffic flow features. After outlier removal and intensive model hyper-parameter tuning through randomized search and cross-validation, we show that the extreme gradient boost approach outperformed all models, including the gradient-boosted decision-trees by almost 53%. Finally, we perform a feature importance evaluation for incident duration prediction and show that the best prediction results are obtained when leveraging the real-time traffic flow in vicinity road sections to the reported accident location. Initial methods used to predict the incident duration were 1. Introduction Bayesian classifiers , discrete choice models (DCM) , probabilistic distribution analyses , and the hazard-based Traffic congestion is a major concern for many cities duration models (HBDM) .