Is augmentation effective to improve prediction in imbalanced text datasets?

Assunção, Gabriel O., Izbicki, Rafael, Prates, Marcos O.

Apr-20-2023–arXiv.org Artificial Intelligence

Imbalanced datasets present a significant challenge for machine learning models, often leading to biased predictions. To address this issue, data augmentation techniques are widely used in natural language processing (NLP) to generate new samples for the minority class. However, in this paper, we challenge the common assumption that data augmentation is always necessary to improve predictions on imbalanced datasets. Instead, we argue that adjusting the classifier cutoffs without data augmentation can produce similar results to oversampling techniques. Our study provides theoretical and empirical evidence to support this claim. Our findings contribute to a better understanding of the strengths and limitations of different approaches to dealing with imbalanced data, and help researchers and practitioners make informed decisions about which methods to use for a given task.

machine learning, natural language, text classification, (17 more...)

arXiv.org Artificial Intelligence

Apr-20-2023

arXiv.org PDF

Add feedback

Country:
- South America > Brazil
  - Minas Gerais > Belo Horizonte (0.04)
- North America > United States
  - Iowa (0.05)
  - California (0.04)
- Europe > United Kingdom
  - England > Cambridgeshire > Cambridge (0.04)
- Asia > China
  - Anhui Province > Hefei (0.04)

Genre:
- Research Report > New Finding (1.00)

Industry:
- Health & Medicine (0.68)

Technology:
- Information Technology > Artificial Intelligence
  - Natural Language > Text Classification (0.47)
  - Machine Learning > Performance Analysis
    - Accuracy (0.47)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found