Goto

Collaborating Authors

 Comert, Gurcan


Unraveling Pedestrian Fatality Patterns: A Comparative Study with Explainable AI

arXiv.org Artificial Intelligence

Road fatalities pose significant public safety and health challenges worldwide, with pedestrians being particularly vulnerable in vehicle-pedestrian crashes due to disparities in physical and performance characteristics. This study employs explainable artificial intelligence (XAI) to identify key factors contributing to pedestrian fatalities across the five U.S. states with the highest crash rates (2018-2022). It compares them to the five states with the lowest fatality rates. Using data from the Fatality Analysis Reporting System (FARS), the study applies machine learning techniques-including Decision Trees, Gradient Boosting Trees, Random Forests, and XGBoost-to predict contributing factors to pedestrian fatalities. To address data imbalance, the Synthetic Minority Over-sampling Technique (SMOTE) is utilized, while SHapley Additive Explanations (SHAP) values enhance model interpretability. The results indicate that age, alcohol and drug use, location, and environmental conditions are significant predictors of pedestrian fatalities. The XGBoost model outperformed others, achieving a balanced accuracy of 98 %, accuracy of 90 %, precision of 92 %, recall of 90 %, and an F1 score of 91 %. Findings reveal that pedestrian fatalities are more common in mid-block locations and areas with poor visibility, with older adults and substance-impaired individuals at higher risk. These insights can inform policymakers and urban planners in implementing targeted safety measures, such as improved lighting, enhanced pedestrian infrastructure, and stricter traffic law enforcement, to reduce fatalities and improve public safety.


Analyzing Factors Influencing Driver Willingness to Accept Advanced Driver Assistance Systems

arXiv.org Artificial Intelligence

Analyzing Factors Influencing Driver Willingness to Accept Advanced Driver Assistance Systems Hannah Musau a,, Nana Kankam Gyimah a, Judith Mwakalonge a, Gurcan Comert b, Saidi Siuhi a a Department of Engineering, South Carolina State University, Orangeburg, South Carolina, USA, 29117 b Department of Computational Engineering and Data Science, North Carolina A&T State University, Greensboro, North Carolina, US, 27411Abstract Advanced Driver Assistance Systems (ADAS) enhance highway safety by improving environmental perception and reducing human errors. However, misconceptions, trust issues, and knowledge gaps hinder widespread adoption. This study examines driver perceptions, knowledge sources, and usage patterns of ADAS in passenger vehicles. A nationwide survey collected data from a diverse sample of U.S. drivers. Machine learning models predicted ADAS adoption, with SHAP (SHapley Additive Explanations) identifying key influencing factors. Findings indicate that higher trust levels correlate with increased ADAS usage, while concerns about reliability remain a barrier. Findings emphasize the influence of socioeconomic, demographic, and behavioral factors on ADAS adoption, offering guidance for automakers, policymakers, and safety advocates to improve awareness, trust, and usability. Introduction Human factors are the leading cause of road crashes, contributing to over 90% of incidents either alone or alongside failures in vehicles or infrastructure [1].


A Multi-Scale Isolation Forest Approach for Real-Time Detection and Filtering of FGSM Adversarial Attacks in Video Streams of Autonomous Vehicles

arXiv.org Artificial Intelligence

Deep Neural Networks (DNNs) have demonstrated remarkable success across a wide range of tasks, particularly in fields such as image classification. However, DNNs are highly susceptible to adversarial attacks, where subtle perturbations are introduced to input images, leading to erroneous model outputs. In today's digital era, ensuring the security and integrity of images processed by DNNs is of critical importance. One of the most prominent adversarial attack methods is the Fast Gradient Sign Method (FGSM), which perturbs images in the direction of the loss gradient to deceive the model. This paper presents a novel approach for detecting and filtering FGSM adversarial attacks in image processing tasks. Our proposed method evaluates 10,000 images, each subjected to five different levels of perturbation, characterized by $\epsilon$ values of 0.01, 0.02, 0.05, 0.1, and 0.2. These perturbations are applied in the direction of the loss gradient. We demonstrate that our approach effectively filters adversarially perturbed images, mitigating the impact of FGSM attacks. The method is implemented in Python, and the source code is publicly available on GitHub for reproducibility and further research.


A Systematic Evaluation of Generative Models on Tabular Transportation Data

arXiv.org Artificial Intelligence

The sharing of large-scale transportation data is beneficial for transportation planning and policymaking. However, it also raises significant security and privacy concerns, as the data may include identifiable personal information, such as individuals' home locations. To address these concerns, synthetic data generation based on real transportation data offers a promising solution that allows privacy protection while potentially preserving data utility. Although there are various synthetic data generation techniques, they are often not tailored to the unique characteristics of transportation data, such as the inherent structure of transportation networks formed by all trips in the datasets. In this paper, we use New York City taxi data as a case study to conduct a systematic evaluation of the performance of widely used tabular data generative models. In addition to traditional metrics such as distribution similarity, coverage, and privacy preservation, we propose a novel graph-based metric tailored specifically for transportation data. This metric evaluates the similarity between real and synthetic transportation networks, providing potentially deeper insights into their structural and functional alignment. We also introduced an improved privacy metric to address the limitations of the commonly-used one. Our experimental results reveal that existing tabular data generative models often fail to perform as consistently as claimed in the literature, particularly when applied to transportation data use cases. Furthermore, our novel graph metric reveals a significant gap between synthetic and real data. This work underscores the potential need to develop generative models specifically tailored to take advantage of the unique characteristics of emerging domains, such as transportation.


Quantum Adversarial Machine Learning and Defense Strategies: Challenges and Opportunities

arXiv.org Artificial Intelligence

As quantum computing continues to advance, the development of quantum-secure neural networks is crucial to prevent adversarial attacks. This paper proposes three quantum-secure design principles: (1) using post-quantum cryptography, (2) employing quantum-resistant neural network architectures, and (3) ensuring transparent and accountable development and deployment. These principles are supported by various quantum strategies, including quantum data anonymization, quantum-resistant neural networks, and quantum encryption. The paper also identifies open issues in quantum security, privacy, and trust, and recommends exploring adaptive adversarial attacks and auto adversarial attacks as future directions. The proposed design principles and recommendations provide guidance for developing quantum-secure neural networks, ensuring the integrity and reliability of machine learning models in the quantum era.


Graph-Powered Defense: Controller Area Network Intrusion Detection for Unmanned Aerial Vehicles

arXiv.org Artificial Intelligence

The network of services, including delivery, farming, and environmental monitoring, has experienced exponential expansion in the past decade with Unmanned Aerial Vehicles (UAVs). Yet, UAVs are not robust enough against cyberattacks, especially on the Controller Area Network (CAN) bus. The CAN bus is a general-purpose vehicle-bus standard to enable microcontrollers and in-vehicle computers to interact, primarily connecting different Electronic Control Units (ECUs). In this study, we focus on solving some of the most critical security weaknesses in UAVs by developing a novel graph-based intrusion detection system (IDS) leveraging the Uncomplicated Application-level Vehicular Communication and Networking (UAVCAN) protocol. First, we decode CAN messages based on UAVCAN protocol specification; second, we present a comprehensive method of transforming tabular UAVCAN messages into graph structures. Lastly, we apply various graph-based machine learning models for detecting cyber-attacks on the CAN bus, including graph convolutional neural networks (GCNNs), graph attention networks (GATs), Graph Sample and Aggregate Networks (GraphSAGE), and graph structure-based transformers. Our findings show that inductive models such as GATs, GraphSAGE, and graph-based transformers can achieve competitive and even better accuracy than transductive models like GCNNs in detecting various types of intrusions, with minimum information on protocol specification, thus providing a generic robust solution for CAN bus security for the UAVs. We also compared our results with baseline single-layer Long Short-Term Memory (LSTM) and found that all our graph-based models perform better without using any decoded features based on the UAVCAN protocol, highlighting higher detection performance with protocol-independent capability.


Crash Severity Risk Modeling Strategies under Data Imbalance

arXiv.org Artificial Intelligence

This study investigates crash severity risk modeling strategies for work zones involving large vehicles (i.e., trucks, buses, and vans) when there are crash data imbalance between low-severity (LS) and high-severity (HS) crashes. We utilized crash data, involving large vehicles in South Carolina work zones for the period between 2014 and 2018, which included 4 times more LS crashes compared to HS crashes. The objective of this study is to explore crash severity prediction performance of various models under different feature selection and data balancing techniques. The findings of this study highlight a disparity between LS and HS predictions, with less-accurate prediction of HS crashes compared to LS crashes due to class imbalance and feature overlaps between LS and HS crashes. Combining features from multiple feature selection techniques: statistical correlation, feature importance, recursive elimination, statistical tests, and mutual information, slightly improves HS crash prediction performance. Data balancing techniques such as NearMiss-1 and RandomUnderSampler, maximize HS recall when paired with certain prediction models, such as Bayesian Mixed Logit (BML), NeuralNet, and RandomForest, making them suitable for HS crash prediction. Conversely, RandomOverSampler, HS Class Weighting, and Kernel-based Synthetic Minority Oversampling (K-SMOTE), used with certain prediction models such as BML, CatBoost, and LightGBM, achieve a balanced performance, defined as achieving an equitable trade-off between LS and HS prediction performance metrics. These insights provide safety analysts with guidance to select models, feature selection techniques, and data balancing techniques that align with their specific safety objectives, offering a robust foundation for enhancing work-zone crash severity prediction.


An AutoML-based approach for Network Intrusion Detection

arXiv.org Artificial Intelligence

In this paper, we present an automated machine learning (AutoML) approach for network intrusion detection, leveraging a stacked ensemble model developed using the MLJAR AutoML framework. Our methodology combines multiple machine learning algorithms, including LightGBM, CatBoost, and XGBoost, to enhance detection accuracy and robustness. By automating model selection, feature engineering, and hyperparameter tuning, our approach reduces the manual overhead typically associated with traditional machine learning methods. Extensive experimentation on the NSL-KDD dataset demonstrates that the stacked ensemble model outperforms individual models, achieving high accuracy and minimizing false positives. Our findings underscore the benefits of using AutoML for network intrusion detection, as the AutoML-driven stacked ensemble achieved the highest performance with 90\% accuracy and an 89\% F1 score, outperforming individual models like Random Forest (78\% accuracy, 78\% F1 score), XGBoost and CatBoost (both 80\% accuracy, 80\% F1 score), and LightGBM (78\% accuracy, 78\% F1 score), providing a more adaptable and efficient solution for network security applications.


Development and Evaluation of Ensemble Learning-based Environmental Methane Detection and Intensity Prediction Models

arXiv.org Artificial Intelligence

The environmental impacts of global warming driven by methane (CH4) emissions have catalyzed significant research initiatives in developing novel technologies that enable proactive and rapid detection of CH4. Several data-driven machine learning (ML) models were tested to determine how well they identified fugitive CH4 and its related intensity in the affected areas. Various meteorological characteristics, including wind speed, temperature, pressure, relative humidity, water vapor, and heat flux, were included in the simulation. We used the ensemble learning method to determine the best-performing weighted ensemble ML models built upon several weaker lower-layer ML models to (i) detect the presence of CH4 as a classification problem and (ii) predict the intensity of CH4 as a regression problem.


Bayesian Parameter Estimations for Grey System Models in Online Traffic Speed Predictions

arXiv.org Artificial Intelligence

This paper presents Bayesian parameter estimation for first order Grey system models' parameters (or sometimes referred to as hyperparameters). There are different forms of first-order Grey System Models. These include $GM(1,1)$, $GM(1,1| \cos(\omega t)$, $GM(1,1| \sin(\omega t)$, and $GM(1,1| \cos(\omega t), \sin(\omega t)$. The whitenization equation of these models is a first-order linear differential equation of the form \[ \frac{dx}{dt} + a x = f(t) \] where $a$ is a parameter and $f(t) = b$ in $GM(1,1|)$ , $f(t) = b_1\cos(\omega t) + b_2$ in $GM(1,1| cos(\omega t)$, $f(t) = b_1\sin(\omega t)+b_2$ in $GM(1,1| \sin(\omega t)$, $f(t) = b_1\sin(\omega t) + b_2\cos(\omega t) + b_3$ in $GM(1,1| \cos(\omega t), \sin(\omega t)$, $f(t) = b x^2$ in Grey Verhulst model (GVM), and where $b, b_1, b_2$, and $b_3$ are parameters. The results from Bayesian estimations are compared to the least square estimated models with fixed $\omega$. We found that using rolling Bayesian estimations for GM parameters can allow us to estimate the parameters in all possible forms. Based on the data used for the comparison, the numerical results showed that models with Bayesian parameter estimations are up to 45\% more accurate in mean squared errors.