Detecting Unintended Social Bias in Toxic Language Datasets

Sahoo, Nihar, Gupta, Himanshu, Bhattacharyya, Pushpak

Oct-21-2022–arXiv.org Artificial Intelligence

Warning: This paper has contents which may be offensive, or upsetting however this cannot be avoided owing to the nature of the work. With the rise of online hate speech, automatic detection of Hate Speech, Offensive texts as a natural language processing task is getting popular. However, very little research has been done to detect unintended social bias from these toxic language datasets. This paper introduces a new dataset ToxicBias curated from the existing dataset of Kaggle competition named "Jigsaw Unintended Bias in Toxicity Classification". We aim to detect Figure 1: An illustrative example of ToxicBias. During social biases, their categories, and targeted the annotation process, hate speech/offensive text groups. The dataset contains instances annotated is provided without context. Annotators are asked to for five different bias categories, viz., mark it as biased/neutral and to provide category, target, gender, race/ethnicity, religion, political, and and implication if it has biases.

artificial intelligence, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

Oct-21-2022

arXiv.org PDF

Add feedback

Country:
- South America > Chile
  - Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
- North America > United States
  - Pennsylvania (0.04)
  - New York > New York County
    - New York City (0.04)
  - Minnesota > Hennepin County
    - Minneapolis (0.14)
  - California > San Diego County
    - San Diego (0.04)
- Europe
  - Spain > Catalonia
    - Barcelona Province > Barcelona (0.04)
  - Italy > Tuscany
    - Florence (0.04)
  - France > Provence-Alpes-Côte d'Azur
    - Bouches-du-Rhône > Marseille (0.04)
  - Belgium > Brussels-Capital Region
    - Brussels (0.04)
- Asia
  - India (0.04)
  - Middle East > Qatar
    - Ad-Dawhah > Doha (0.04)
- Africa > Eswatini
  - Manzini > Manzini (0.04)

Genre:
- Research Report (0.82)

Industry:
- Law > Civil Rights & Constitutional Law (0.68)
- Information Technology (0.68)

Technology:
- Information Technology > Artificial Intelligence
  - Natural Language (1.00)
  - Machine Learning
    - Neural Networks > Deep Learning (0.95)
    - Performance Analysis > Accuracy (0.94)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found