Ustnlp16 at SemEval-2025 Task 9: Improving Model Performance through Imbalance Handling and Focal Loss

Cai, Zhuoang, Li, Zhenghao, Liu, Yang, Guo, Liyuan, Song, Yangqiu

May-2-2025–arXiv.org Artificial Intelligence

Classification tasks often suffer from imbal- anced data distribution, which presents chal- lenges in food hazard detection due to severe class imbalances, short and unstructured text, and overlapping semantic categories. In this paper, we present our system for SemEval- 2025 Task 9: Food Hazard Detection, which ad- dresses these issues by applying data augmenta- tion techniques to improve classification perfor- mance. We utilize transformer-based models, BERT and RoBERTa, as backbone classifiers and explore various data balancing strategies, including random oversampling, Easy Data Augmentation (EDA), and focal loss. Our ex- periments show that EDA effectively mitigates class imbalance, leading to significant improve- ments in accuracy and F1 scores. Furthermore, combining focal loss with oversampling and EDA further enhances model robustness, par- ticularly for hard-to-classify examples. These findings contribute to the development of more effective NLP-based classification models for food hazard detection.

large language model, machine learning, natural language, (16 more...)

arXiv.org Artificial Intelligence

May-2-2025

arXiv.org PDF

Add feedback

Country:
- Europe > Austria > Vienna (0.14)

Genre:
- Research Report > New Finding (0.95)

Technology:
- Information Technology > Artificial Intelligence
  - Natural Language
    - Text Processing (0.55)
    - Large Language Model (0.36)
  - Machine Learning > Neural Networks
    - Deep Learning (0.36)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found