Vehicle Classification under Extreme Imbalance: A Comparative Study of Ensemble Learning and CNNs

Sep-30-2025–arXiv.org Artificial Intelligence

We curate a 16 - class corpus (~47k images) by merging Kaggle, ImageNet, and web - cr awled data, and create six balanced variants via SMOTE oversampling and targeted undersampling. Lightweight ensembles, such as Random Forest, AdaBoost, and a soft - voting combiner built on MobileNet - V2 features are benchmarked against a configurable ResNet - style CNN trained with strong augmentation and label smoothing. The best ensemble (SMOTE - combined) attains 74.8% test accuracy, while the CNN achieves 79.19% on the full test set and 81.25% on an unseen inferen ce batch, confirming the advantage of deep models. Nonetheless, the most under - represented class (Barge) remains a failure mode, highlighting the limits of rebalancing alone. Results suggest prioritizing additional minority - class collection and cost - sensit ive objectives (e.g., focal loss) and exploring hybrid ensemble or CNN pipelines to combine interpretability with representational power. The best ensemble (SMOTE - combined) reached 74.8% test accuracy, while the final checkpoint of CNN achieved 79.1 9 % on the full test set and 81. 25 % on an unseen EE531 inference batch, confirming that deep models excel overall but still falter on the most under - represented class ( Barge), underscoring the persistent challenge of extreme imbalance.

accuracy, artificial intelligence, machine learning, (17 more...)

arXiv.org Artificial Intelligence

Sep-30-2025

arXiv.org PDF

Add feedback

Genre:
- Research Report > New Finding (0.48)

Industry:
- Transportation (0.48)

Technology:
- Information Technology > Artificial Intelligence > Machine Learning
  - Performance Analysis > Accuracy (1.00)
  - Neural Networks > Deep Learning (0.69)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found