A Weakly Supervised Classifier and Dataset of White Supremacist Language

Yoder, Michael Miller, Diab, Ahmad, Brown, David West, Carley, Kathleen M.

Jun-27-2023–arXiv.org Artificial Intelligence

We present a dataset and classifier for detecting the language of white supremacist extremism, a growing issue in online hate speech. Our weakly supervised classifier is trained on large datasets of text from explicitly white supremacist domains paired with neutral and anti-racist data from similar domains. We demonstrate that this approach improves generalization performance to new domains. Incorporating anti-racist texts as counterexamples to white supremacist language mitigates bias.

classifier, dataset, ideology, (12 more...)

arXiv.org Artificial Intelligence

Jun-27-2023

arXiv.org PDF

Add feedback

Country:
- Africa > South Africa (0.04)
- North America
  - Dominican Republic (0.04)
  - Canada (0.04)
  - United States
    - Virginia (0.04)
    - North Carolina (0.04)
    - Pennsylvania > Allegheny County
      - Pittsburgh (0.14)
    - New York > New York County
      - New York City (0.04)
- Europe
  - Sweden (0.14)
  - Russia (0.04)
  - Germany (0.04)
  - Denmark (0.04)
  - Norway (0.04)
  - France (0.04)
  - Croatia (0.04)
  - Ireland (0.04)
  - Hungary (0.04)
  - Italy > Piedmont
    - Turin Province > Turin (0.04)
  - United Kingdom > England
    - Oxfordshire > Oxford (0.04)
- Asia
  - Russia (0.04)
  - China > Hong Kong (0.04)
  - Middle East
    - Jordan (0.04)
    - Israel (0.04)
    - UAE > Abu Dhabi Emirate
      - Abu Dhabi (0.04)

Genre:
- Research Report (0.82)

Industry:
- Law Enforcement & Public Safety > Terrorism (1.00)
- Law > Civil Rights & Constitutional Law (1.00)

Technology:
- Information Technology
  - Communications > Social Media (1.00)
  - Artificial Intelligence
    - Natural Language (1.00)
    - Machine Learning > Performance Analysis
      - Accuracy (0.68)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found