Meta learning with language models: Challenges and opportunities in the classification of imbalanced text

Vassilev, Apostol, Jin, Honglan, Hasan, Munawar

Oct-24-2023–arXiv.org Artificial Intelligence

Out of policy speech (OOPS) has permeated social media with serious consequences for both individuals and society. Although it comprises a small fraction of the content generated daily on social media, sifting through the data to quickly identify and eliminate the toxic content is difficult. The scale of this problem has long passed a threshold that requires automated detection. Yet it remains to be a challenging problem for machine learning (ML) due to the way OOPS manifests itself in datasets: context-dependent, nuanced, non-colloquial language that may even be syntactically incorrect. Because the OOPS content of the dataset is usually only a small fraction of the overall size, there is a high imbalance between OOPS and in-policy text. Related to this, there are not many high-quality labeled datasets with consistent definitions of OOPS and in-policy content. The difficulties are exacerbated further by significant differences in the distributions of the datasets that the model has been trained on and the data it sees during deployment. When faced with all of these challenges, ML models applied to natural language processing (NLP) tasks quickly reach a performance ceiling that limits their usefulness for sensitive tasks, such as OOPS detection.

combiner, dataset, individual model, (14 more...)

arXiv.org Artificial Intelligence

Oct-24-2023

arXiv.org PDF

Add feedback

Country:
- North America > United States (0.46)
- South America > Chile
  - Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
- Europe > Middle East
  - Cyprus > Nicosia > Nicosia (0.04)
- Asia
  - Taiwan (0.04)
  - Middle East > Jordan (0.04)

Genre:
- Research Report > New Finding (0.46)

Industry:
- Health & Medicine (1.00)

Technology:
- Information Technology > Artificial Intelligence
  - Natural Language (1.00)
  - Machine Learning > Neural Networks
    - Deep Learning (0.46)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found