AITopics | smoteenn

Collaborating Authors

smoteenn

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Tabular Data with Class Imbalance: Predicting Electric Vehicle Crash Severity with Pretrained Transformers (TabPFN) and Mamba-Based Models

Somvanshi, Shriyank, Hebli, Pavan, Chhetri, Gaurab, Das, Subasish

arXiv.org Artificial IntelligenceSep-16-2025

This study presents a deep tabular learning framework for predicting crash severity in electric vehicle (EV) collisions using real-world crash data from Texas (2017-2023). After filtering for electric-only vehicles, 23,301 EV-involved crash records were analyzed. Feature importance techniques using XGBoost and Random Forest identified intersection relation, first harmful event, person age, crash speed limit, and day of week as the top predictors, along with advanced safety features like automatic emergency braking. To address class imbalance, Synthetic Minority Over-sampling Technique and Edited Nearest Neighbors (SMOTEENN) resampling was applied. Three state-of-the-art deep tabular models, TabPFN, MambaNet, and MambaAttention, were benchmarked for severity prediction. While TabPFN demonstrated strong generalization, MambaAttention achieved superior performance in classifying severe injury cases due to its attention-based feature reweighting. The findings highlight the potential of deep tabular architectures for improving crash severity prediction and enabling data-driven safety interventions in EV crash contexts.

artificial intelligence, deep learning, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2509.11449

Country:

North America > United States > Texas (0.36)
Asia > Philippines > Luzon > National Capital Region > City of Manila (0.14)

Genre: Research Report (0.82)

Industry:

Transportation > Ground > Road (1.00)
Transportation > Electric Vehicle (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (0.90)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.49)

Add feedback

Crash Severity Analysis of Child Bicyclists using Arm-Net and MambaNet

Somvanshi, Shriyank, Chakraborty, Rohit, Das, Subasish, Dutta, Anandi K

arXiv.org Artificial IntelligenceMar-13-2025

Child bicyclists (14 years and younger) are among the most vulnerable road users, often experiencing severe injuries or fatalities in crashes. This study analyzed 2,394 child bicyclist crashes in Texas from 2017 to 2022 using two deep tabular learning models (ARM-Net and MambaNet). To address the issue of data imbalance, the SMOTEENN technique was applied, resulting in balanced datasets that facilitated accurate crash severity predictions across three categories: Fatal/Severe (KA), Moderate/Minor (BC), and No Injury (O). The findings revealed that MambaNet outperformed ARM-Net, achieving higher precision, recall, F1-scores, and accuracy, particularly in the KA and O categories. Both models highlighted challenges in distinguishing BC crashes due to overlapping characteristics. These insights underscored the value of advanced tabular deep learning methods and balanced datasets in understanding crash severity. While limitations such as reliance on categorical data exist, future research could explore continuous variables and real-time behavioral data to enhance predictive modeling and crash mitigation strategies.

category, mambanet, severity, (13 more...)

arXiv.org Artificial Intelligence

2503.11003

Country:

North America > United States > Texas > Hays County > San Marcos (0.05)
Asia > Middle East > Israel (0.05)
North America > United States > New York (0.04)
(4 more...)

Genre: Research Report (1.00)

Industry:

Leisure & Entertainment > Sports > Cycling (0.88)
Transportation (0.69)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Applying Tabular Deep Learning Models to Estimate Crash Injury Types of Young Motorcyclists

Somvanshi, Shriyank, Tusti, Anannya Ghosh, Chakraborty, Rohit, Das, Subasish

arXiv.org Artificial IntelligenceMar-13-2025

Young motorcyclists, particularly those aged 15 to 24 years old, face a heightened risk of severe crashes due to factors such as speeding, traffic violations, and helmet usage. This study aims to identify key factors influencing crash severity by analyzing 10,726 young motorcyclist crashes in Texas from 2017 to 2022. Two advanced tabular deep learning models, ARMNet and MambaNet, were employed, using an advanced resampling technique to address class imbalance. The models were trained to classify crashes into three severity levels, Fatal or Severe, Moderate or Minor, and No Injury. ARMNet achieved an accuracy of 87 percent, outperforming 86 percent of Mambanet, with both models excelling in predicting severe and no injury crashes while facing challenges in moderate crash classification. Key findings highlight the significant influence of demographic, environmental, and behavioral factors on crash outcomes. The study underscores the need for targeted interventions, including stricter helmet enforcement and educational programs customized to young motorcyclists. These insights provide valuable guidance for policymakers in developing evidence-based strategies to enhance motorcyclist safety and reduce crash severity.

mambanet, motorcyclist, young motorcyclist, (14 more...)

arXiv.org Artificial Intelligence

2503.10474

Country:

North America > United States > Texas > Hays County > San Marcos (0.04)
North America > Canada > British Columbia (0.04)
North America > United States > New York > New York County > New York City (0.04)
(6 more...)

Genre: Research Report (1.00)

Industry:

Health & Medicine (1.00)
Transportation > Ground > Road (0.96)
Government (0.88)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Advanced User Credit Risk Prediction Model using LightGBM, XGBoost and Tabnet with SMOTEENN

Yu, Chang, Jin, Yixin, Xing, Qianwen, Zhang, Ye, Guo, Shaobo, Meng, Shuchen

arXiv.org Artificial IntelligenceAug-6-2024

Bank credit risk is a significant challenge in modern financial transactions, and the ability to identify qualified credit card holders among a large number of applicants is crucial for the profitability of a bank'sbank's credit card business. In the past, screening applicants'applicants' conditions often required a significant amount of manual labor, which was time-consuming and labor-intensive. Although the accuracy and reliability of previously used ML models have been continuously improving, the pursuit of more reliable and powerful AI intelligent models is undoubtedly the unremitting pursuit by major banks in the financial industry. In this study, we used a dataset of over 40,000 records provided by a commercial bank as the research object. We compared various dimensionality reduction techniques such as PCA and T-SNE for preprocessing high-dimensional datasets and performed in-depth adaptation and tuning of distributed models such as LightGBM and XGBoost, as well as deep models like Tabnet. After a series of research and processing, we obtained excellent research results by combining SMOTEENN with these techniques. The experiments demonstrated that LightGBM combined with PCA and SMOTEENN techniques can assist banks in accurately predicting potential high-quality customers, showing relatively outstanding performance compared to other models.

arxiv preprint arxiv, lightgbm, smoteenn, (13 more...)

arXiv.org Artificial Intelligence

2408.03497

Country:

North America > United States > Michigan > Washtenaw County > Ann Arbor (0.14)
North America > United States > Illinois > Cook County > Chicago (0.04)
North America > Trinidad and Tobago > Trinidad > Arima > Arima (0.04)
(4 more...)

Genre: Research Report > New Finding (1.00)

Industry: Banking & Finance > Credit (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Fraud Detection Using Optimized Machine Learning Tools Under Imbalance Classes

Isangediok, Mary, Gajamannage, Kelum

arXiv.org Artificial IntelligenceSep-4-2022

Fraud detection is a challenging task due to the changing nature of fraud patterns over time and the limited availability of fraud examples to learn such sophisticated patterns. Thus, fraud detection with the aid of smart versions of machine learning (ML) tools is essential to assure safety. Fraud detection is a primary ML classification task; however, the optimum performance of the corresponding ML tool relies on the usage of the best hyperparameter values. Moreover, classification under imbalanced classes is quite challenging as it causes poor performance in minority classes, which most ML classification techniques ignore. Thus, we investigate four state-of-the-art ML techniques, namely, logistic regression, decision trees, random forest, and extreme gradient boost, that are suitable for handling imbalance classes to maximize precision and simultaneously reduce false positives. First, these classifiers are trained on two original benchmark unbalanced fraud detection datasets, namely, phishing website URLs and fraudulent credit card transactions. Then, three synthetically balanced datasets are produced for each original data set by implementing the sampling frameworks, namely, RandomUnderSampler, SMOTE, and SMOTEENN. The optimum hyperparameters for all the 16 experiments are revealed using the method RandomzedSearchCV. The validity of the 16 approaches in the context of fraud detection is compared using two benchmark performance metrics, namely, area under the curve of receiver operating characteristics (AUC ROC) and area under the curve of precision and recall (AUC PR). For both phishing website URLs and credit card fraud transaction datasets, the results indicate that extreme gradient boost trained on the original data shows trustworthy performance in the imbalanced dataset and manages to outperform the other three methods in terms of both AUC ROC and AUC PR.

classifier, dataset, detection, (13 more...)

arXiv.org Artificial Intelligence

2209.01642

Country:

Europe (0.14)
North America > United States > Texas > Nueces County > Corpus Christi (0.04)
Africa (0.04)

Genre:

Research Report > New Finding (0.35)
Research Report > Experimental Study (0.35)

Industry:

Law Enforcement & Public Safety > Fraud (1.00)
Information Technology > Security & Privacy (1.00)
Banking & Finance (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)

Add feedback

How to Develop an Imbalanced Classification Model to Detect Oil Spills

#artificialintelligenceFeb-22-2020, 05:25:28 GMT

Many imbalanced classification tasks require a skillful model that predicts a crisp class label, where both classes are equally important. An example of an imbalanced classification problem where a class label is required and both classes are equally important is the detection of oil spills or slicks in satellite images. The detection of a spill requires mobilizing an expensive response, and missing an event is equally expensive, causing damage to the environment. One way to evaluate imbalanced classification models that predict crisp labels is to calculate the separate accuracy on the positive class and the negative class, referred to as sensitivity and specificity. These two measures can then be averaged using the geometric mean, referred to as the G-mean, that is insensitive to the skewed class distribution and correctly reports on the skill of the model on both classes. In this tutorial, you will discover how to develop a model to predict the presence of an oil spill in satellite images and evaluate it using the G-mean metric. Develop an Imbalanced Classification Model to Detect Oil Spills Photo by Lenny K Photography, some rights reserved. In this project, we will use a standard imbalanced machine learning dataset referred to as the "oil spill" dataset, "oil slicks" dataset or simply "oil."

artificial intelligence, dataset, machine learning, (17 more...)

#artificialintelligence

Genre:

Research Report (0.54)
Instructional Material > Course Syllabus & Notes (0.48)

Industry: Energy > Oil & Gas (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.34)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.33)

Add feedback