Towards Weakly-Supervised Hate Speech Classification Across Datasets

Jin, Yiping, Wanner, Leo, Kadam, Vishakha Laxman, Shvets, Alexander

May-30-2023–arXiv.org Artificial Intelligence

As pointed out by several scholars, current research on hate speech (HS) recognition is characterized by unsystematic data creation strategies and diverging annotation schemata. Subsequently, supervised-learning models tend to generalize poorly to datasets they were not trained on, and the performance of the models trained on datasets labeled using different HS taxonomies cannot be compared. To ease this problem, we propose applying extremely weak supervision that only relies on the class name rather than on class samples from the annotated data. We demonstrate the effectiveness of a state-of-the-art weakly-supervised text classification model in various in-dataset and cross-dataset settings. Furthermore, we conduct an in-depth quantitative and qualitative analysis of the source of poor generalizability of HS classification models.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

May-30-2023

arXiv.org PDF

Add feedback

Country:
- Oceania > Australia
  - Victoria > Melbourne (0.04)
  - New South Wales > Sydney (0.04)
- North America > United States
  - Texas > Travis County
    - Austin (0.04)
  - Minnesota > Hennepin County
    - Minneapolis (0.14)
  - California
    - Santa Clara County > Stanford (0.04)
    - San Diego County > San Diego (0.04)
- Europe
  - Russia (0.04)
  - Germany (0.04)
  - Czechia > Prague (0.04)
  - Spain > Catalonia
    - Barcelona Province > Barcelona (0.04)
  - Italy > Piedmont
    - Turin Province > Turin (0.04)
  - France > Provence-Alpes-Côte d'Azur
    - Bouches-du-Rhône > Marseille (0.04)
  - Denmark > Capital Region
    - Copenhagen (0.04)
  - Belgium > Brussels-Capital Region
    - Brussels (0.04)
- Asia
  - Pakistan (0.04)
  - Vietnam (0.04)
  - Russia (0.04)
  - Japan (0.04)
  - China > Hong Kong (0.04)
  - Taiwan > Taiwan Province
    - Taipei (0.04)
  - Middle East
    - Israel (0.04)
    - Syria (0.04)
    - Iran (0.04)
    - UAE > Abu Dhabi Emirate
      - Abu Dhabi (0.04)
    - Qatar > Ad-Dawhah
      - Doha (0.04)
    - Palestine > Gaza Strip
      - Gaza Governorate > Gaza (0.04)
    - Iraq > Nineveh Governorate
      - Mosul (0.04)
  - India > Maharashtra
    - Pune (0.04)
- Africa
  - Nigeria (0.04)
  - Central African Republic > Ombella-M'Poko
    - Bimbo (0.04)

Genre:
- Research Report (0.82)

Industry:
- Media (1.00)
- Law Enforcement & Public Safety > Terrorism (0.46)
- Law > Civil Rights & Constitutional Law (0.30)

Technology:
- Information Technology > Artificial Intelligence
  - Natural Language
    - Large Language Model (0.47)
    - Text Processing (0.46)
  - Machine Learning
    - Neural Networks (0.68)
    - Statistical Learning > Support Vector Machines (0.46)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found