crash severity
ST-GraphNet: A Spatio-Temporal Graph Neural Network for Understanding and Predicting Automated Vehicle Crash Severity
Mimi, Mahmuda Sultana, Islam, Md Monzurul, Tusti, Anannya Ghosh, Somvanshi, Shriyank, Das, Subasish
Understanding the spatial and temporal dynamics of automated vehicle (AV) crash severity is critical for advancing urban mobility safety and infrastructure planning. In this work, we introduce ST-GraphNet, a spatio-temporal graph neural network framework designed to model and predict AV crash severity by using both fine-grained and region-aggregated spatial graphs. Using a balanced dataset of 2,352 real-world AV-related crash reports from Texas (2024), including geospatial coordinates, crash timestamps, SAE automation levels, and narrative descriptions, we construct two complementary graph representations: (1) a fine-grained graph with individual crash events as nodes, where edges are defined via spatio-temporal proximity; and (2) a coarse-grained graph where crashes are aggregated into Hexagonal Hierarchical Spatial Indexing (H3)-based spatial cells, connected through hexagonal adjacency. Each node in the graph is enriched with multimodal data, including semantic, spatial, and temporal attributes, including textual embeddings from crash narratives using a pretrained Sentence-BERT model. We evaluate various graph neural network (GNN) architectures, such as Graph Convolutional Networks (GCN), Graph Attention Networks (GAT), and Dynamic Spatio-Temporal GCN (DSTGCN), to classify crash severity and predict high-risk regions. Our proposed ST-GraphNet, which utilizes a DSTGCN backbone on the coarse-grained H3 graph, achieves a test accuracy of 97.74\%, substantially outperforming the best fine-grained model (64.7\% test accuracy). These findings highlight the effectiveness of spatial aggregation, dynamic message passing, and multi-modal feature integration in capturing the complex spatio-temporal patterns underlying AV crash severity.
- North America > United States > California (0.14)
- North America > United States > Texas > Hays County > San Marcos (0.05)
- North America > United States > New York > New York County > New York City (0.04)
- North America > Trinidad and Tobago > Trinidad > Arima > Arima (0.04)
- Transportation > Ground > Road (1.00)
- Transportation > Infrastructure & Services (0.94)
- Information Technology (0.93)
Crash Severity Prediction Using Deep Learning Approaches: A Hybrid CNN-RNN Framework
Accurate and timely prediction of crash severity is crucial in mitigating the severe consequences of traffic accidents. Accurate and timely prediction of crash severity is crucial in mitigating the severe consequences of traffic accidents. In order to provide appropriate levels of medical assistance and transportation services, an intelligent transportation system relies on effective prediction methods. Deep learning models have gained popularity in this domain due to their capability to capture non-linear relationships among variables. In this research, we have implemented a hybrid CNN-RNN deep learning model for crash severity prediction and compared its performance against widely used statistical and machine learning models such as logistic regression, naïve bayes classifier, K-Nearest Neighbors (KNN), decision tree, and individual deep learning models: RNN and CNN. This study employs a methodology that considers the interconnected relationships between various features of traffic accidents. The study was conducted using a dataset of 15,870 accident records gathered over a period of seven years between 2015 and 2021 on Virginia highway I-64. The findings demonstrate that the proposed CNN-RNN hybrid model has outperformed all benchmark models in terms of predicting crash severity. This result illustrates the effectiveness of the hybrid model as it combines the advantages of both RNN and CNN models in order to achieve greater accuracy in the prediction process.
- North America > United States > Virginia (0.25)
- Africa > Ethiopia (0.04)
- North America > United States > Texas > Bexar County > San Antonio (0.04)
- (7 more...)
- Transportation > Ground > Road (1.00)
- Government (1.00)
- Transportation > Infrastructure & Services (0.66)
A Dimensionality-Reduced XAI Framework for Roundabout Crash Severity Insights
Chakraborty, Rohit, Das, Subasish
Roundabouts reduce severe crashes, yet risk patterns vary by conditions. This study analyzes 2017-2021 Ohio roundabout crashes using a two-step, explainable workflow. Cluster Correspondence Analysis (CCA) identifies co-occurring factors and yields four crash patterns. A tree-based severity model is then interpreted with SHAP to quantify drivers of injury within and across patterns. Results show higher severity when darkness, wet surfaces, and higher posted speeds coincide with fixed-object or angle events, and lower severity in clear, low-speed settings. Pattern-specific explanations highlight mechanisms at entries (fail-to-yield, gap acceptance), within multi-lane circulation (improper maneuvers), and during slow-downs (rear-end). The workflow links pattern discovery with case-level explanations, supporting site screening, countermeasure selection, and audit-ready reporting. The contribution to Information Systems is a practical template for usable XAI in public safety analytics.
- North America > United States > Ohio (0.25)
- North America > United States > Michigan (0.05)
- North America > United States > Texas > Hays County > San Marcos (0.04)
- North America > United States > Louisiana (0.04)
- Transportation > Infrastructure & Services (1.00)
- Transportation > Ground > Road (1.00)
- Government > Regional Government > North America Government > United States Government (0.46)
Tabular Data with Class Imbalance: Predicting Electric Vehicle Crash Severity with Pretrained Transformers (TabPFN) and Mamba-Based Models
Somvanshi, Shriyank, Hebli, Pavan, Chhetri, Gaurab, Das, Subasish
This study presents a deep tabular learning framework for predicting crash severity in electric vehicle (EV) collisions using real-world crash data from Texas (2017-2023). After filtering for electric-only vehicles, 23,301 EV-involved crash records were analyzed. Feature importance techniques using XGBoost and Random Forest identified intersection relation, first harmful event, person age, crash speed limit, and day of week as the top predictors, along with advanced safety features like automatic emergency braking. To address class imbalance, Synthetic Minority Over-sampling Technique and Edited Nearest Neighbors (SMOTEENN) resampling was applied. Three state-of-the-art deep tabular models, TabPFN, MambaNet, and MambaAttention, were benchmarked for severity prediction. While TabPFN demonstrated strong generalization, MambaAttention achieved superior performance in classifying severe injury cases due to its attention-based feature reweighting. The findings highlight the potential of deep tabular architectures for improving crash severity prediction and enabling data-driven safety interventions in EV crash contexts.
- Asia > Philippines > Luzon > National Capital Region > City of Manila (0.14)
- North America > United States > Texas > Hays County > San Marcos (0.04)
- North America > United States > Virginia (0.04)
- North America > United States > Florida > Palm Beach County > Boca Raton (0.04)
- Transportation > Ground > Road (1.00)
- Transportation > Electric Vehicle (1.00)
Predicting and Explaining Traffic Crash Severity Through Crash Feature Selection
Castellani, Andrea, Papadovasilakis, Zacharias, Papoutsoglou, Giorgos, Cole, Mary, Bautsch, Brian, Rodemann, Tobias, Tsamardinos, Ioannis, Harden, Angela
Motor vehicle crashes remain a leading cause of injury and death worldwide, necessitating data-driven approaches to understand and mitigate crash severity. This study introduces a curated dataset of more than 3 million people involved in accidents in Ohio over six years (2017-2022), aggregated to more than 2.3 million vehicle-level records for predictive analysis. The primary contribution is a transparent and reproducible methodology that combines Automated Machine Learning (AutoML) and explainable artificial intelligence (AI) to identify and interpret key risk factors associated with severe crashes. Using the JADBio AutoML platform, predictive models were constructed to distinguish between severe and non-severe crash outcomes. The models underwent rigorous feature selection across stratified training subsets, and their outputs were interpreted using SHapley Additive exPlanations (SHAP) to quantify the contribution of individual features. A final Ridge Logistic Regression model achieved an AUC-ROC of 85.6% on the training set and 84.9% on a hold-out test set, with 17 features consistently identified as the most influential predictors. Key features spanned demographic, environmental, vehicle, human, and operational categories, including location type, posted speed, minimum occupant age, and pre-crash action. Notably, certain traditionally emphasized factors, such as alcohol or drug impairment, were less influential in the final model compared to environmental and contextual variables. Emphasizing methodological rigor and interpretability over mere predictive performance, this study offers a scalable framework to support Vision Zero with aligned interventions and advanced data-informed traffic safety policy.
- North America > United States > Ohio (0.26)
- Europe > Greece (0.04)
- South America > Colombia (0.04)
- (2 more...)
- Transportation > Ground > Road (1.00)
- Law Enforcement & Public Safety > Crime Prevention & Enforcement (1.00)
- Health & Medicine (1.00)
- (2 more...)
Investigating Robotaxi Crash Severity with Geographical Random Forest and the Urban Environment
Jiao, Junfeng, Baik, Seung Gyu, Choi, Seung Jun, Xu, Yiming
This paper quantitatively investigates the crash severity of Autonomous Vehicles (AVs) with spatially localized machine learning and macroscopic measures of the urban built environment. Extending beyond the microscopic effects of individual infrastructure elements, we focus on the city-scale land use and behavioral patterns, while addressing spatial heterogeneity and spatial autocorrelation. We implemented a spatially localized machine learning technique called Geographical Random Forest (GRF) on the California AV collision dataset. Analyzing multiple urban measures, including points of interest, building footprint, and land use, we built a GRF model and visualized it as a crash severity risk map of San Francisco. This paper presents three findings. First, spatially localized machine learning outperformed regular machine learning in predicting AV crash severity. The bias-variance tradeoff was evident as we adjusted the localization weight hyperparameter. Second, land use was the most important predictor, compared to intersections, building footprints, public transit stops, and Points Of Interest (POIs). Third, AV crashes were more likely to result in low-severity incidents in city center areas with greater diversity and commercial activities, than in residential neighborhoods. Residential land use is likely associated with higher severity due to human behavior and less restrictive environments. Counterintuitively, residential areas were associated with higher crash severity, compared to more complex areas such as commercial and mixed-use areas. When robotaxi operators train their AV systems, it is recommended to: (1) consider where their fleet operates and make localized algorithms for their perception system, and (2) design safety measures specific to residential neighborhoods, such as slower driving speeds and more alert sensors.
- North America > United States > California > San Francisco County > San Francisco (0.25)
- North America > United States > Texas > Travis County > Austin (0.14)
- Europe > Netherlands (0.04)
- (4 more...)
- Research Report > New Finding (0.93)
- Research Report > Experimental Study (0.68)
- Transportation > Passenger (1.00)
- Transportation > Infrastructure & Services (1.00)
- Transportation > Ground > Road (1.00)
- (5 more...)
Applying MambaAttention, TabPFN, and TabTransformers to Classify SAE Automation Levels in Crashes
Somvanshi, Shriyank, Tusti, Anannya Ghosh, Mimi, Mahmuda Sultana, Islam, Md Monzurul, Polock, Sazzad Bin Bashar, Dutta, Anandi, Das, Subasish
The increasing presence of automated vehicles (AVs) presents new challenges for crash classification and safety analysis. Accurately identifying the SAE automation level involved in each crash is essential to understanding crash dynamics and system accountability. However, existing approaches often overlook automation-specific factors and lack model sophistication to capture distinctions between different SAE levels. To address this gap, this study evaluates the performance of three advanced tabular deep learning models MambaAttention, TabPFN, and TabTransformer for classifying SAE automation levels using structured crash data from Texas (2024), covering 4,649 cases categorized as Assisted Driving (SAE Level 1), Partial Automation (SAE Level 2), and Advanced Automation (SAE Levels 3-5 combined). Following class balancing using SMOTEENN, the models were trained and evaluated on a unified dataset of 7,300 records. MambaAttention demonstrated the highest overall performance (F1-scores: 88% for SAE 1, 97% for SAE 2, and 99% for SAE 3-5), while TabPFN excelled in zero-shot inference with high robustness for rare crash categories. In contrast, TabTransformer underperformed, particularly in detecting Partial Automation crashes (F1-score: 55%), suggesting challenges in modeling shared human-system control dynamics. These results highlight the capability of deep learning models tailored for tabular data to enhance the accuracy and efficiency of automation-level classification. Integrating such models into crash analysis frameworks can support policy development, AV safety evaluation, and regulatory decisions, especially in distinguishing high-risk conditions for mid- and high-level automation technologies.
- North America > United States > Texas > Hays County > San Marcos (0.05)
- North America > United States > West Virginia (0.04)
- North America > United States > California (0.04)
- (4 more...)
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (0.67)
- Transportation > Ground > Road (1.00)
- Government (1.00)
- Transportation > Passenger (0.93)
- (2 more...)
Applying Tabular Deep Learning Models to Estimate Crash Injury Types of Young Motorcyclists
Somvanshi, Shriyank, Tusti, Anannya Ghosh, Chakraborty, Rohit, Das, Subasish
Young motorcyclists, particularly those aged 15 to 24 years old, face a heightened risk of severe crashes due to factors such as speeding, traffic violations, and helmet usage. This study aims to identify key factors influencing crash severity by analyzing 10,726 young motorcyclist crashes in Texas from 2017 to 2022. Two advanced tabular deep learning models, ARMNet and MambaNet, were employed, using an advanced resampling technique to address class imbalance. The models were trained to classify crashes into three severity levels, Fatal or Severe, Moderate or Minor, and No Injury. ARMNet achieved an accuracy of 87 percent, outperforming 86 percent of Mambanet, with both models excelling in predicting severe and no injury crashes while facing challenges in moderate crash classification. Key findings highlight the significant influence of demographic, environmental, and behavioral factors on crash outcomes. The study underscores the need for targeted interventions, including stricter helmet enforcement and educational programs customized to young motorcyclists. These insights provide valuable guidance for policymakers in developing evidence-based strategies to enhance motorcyclist safety and reduce crash severity.
- North America > United States > Texas (0.36)
- Africa > Nigeria (0.14)
- North America > United States > New York (0.14)
- (2 more...)
- Health & Medicine (1.00)
- Transportation > Ground > Road (0.96)
- Government (0.88)
Crash Severity Analysis of Child Bicyclists using Arm-Net and MambaNet
Somvanshi, Shriyank, Chakraborty, Rohit, Das, Subasish, Dutta, Anandi K
Child bicyclists (14 years and younger) are among the most vulnerable road users, often experiencing severe injuries or fatalities in crashes. This study analyzed 2,394 child bicyclist crashes in Texas from 2017 to 2022 using two deep tabular learning models (ARM-Net and MambaNet). To address the issue of data imbalance, the SMOTEENN technique was applied, resulting in balanced datasets that facilitated accurate crash severity predictions across three categories: Fatal/Severe (KA), Moderate/Minor (BC), and No Injury (O). The findings revealed that MambaNet outperformed ARM-Net, achieving higher precision, recall, F1-scores, and accuracy, particularly in the KA and O categories. Both models highlighted challenges in distinguishing BC crashes due to overlapping characteristics. These insights underscored the value of advanced tabular deep learning methods and balanced datasets in understanding crash severity. While limitations such as reliance on categorical data exist, future research could explore continuous variables and real-time behavioral data to enhance predictive modeling and crash mitigation strategies.
- North America > United States > Texas (0.37)
- Asia > Middle East (0.14)
- Leisure & Entertainment > Sports > Cycling (0.88)
- Transportation (0.69)
Feature Group Tabular Transformer: A Novel Approach to Traffic Crash Modeling and Causality Analysis
Lares, Oscar, Zhen, Hao, Yang, Jidong J.
Reliable and interpretable traffic crash modeling is essential for understanding causality and improving road safety. This study introduces a novel approach to predicting collision types by utilizing a comprehensive dataset fused from multiple sources, including weather data, crash reports, high-resolution traffic information, pavement geometry, and facility characteristics. Central to our approach is the development of a Feature Group Tabular Transformer (FGTT) model, which organizes disparate data into meaningful feature groups, represented as tokens. These group-based tokens serve as rich semantic components, enabling effective identification of collision patterns and interpretation of causal mechanisms. The FGTT model is benchmarked against widely used tree ensemble models, including Random Forest, XGBoost, and CatBoost, demonstrating superior predictive performance. Furthermore, model interpretation reveals key influential factors, providing fresh insights into the underlying causality of distinct crash types.
- North America > United States > Georgia > Clarke County > Athens (0.14)
- North America > United States > New Jersey (0.04)
- Research Report > Promising Solution (1.00)
- Research Report > New Finding (1.00)
- Overview > Innovation (0.71)
- Transportation > Ground > Road (1.00)
- Automobiles & Trucks (1.00)
- Government (0.93)