Exploratory Data Analysis on Code-mixed Misogynistic Comments

Yadav, Sargam, Kaushik, Abhishek, McDaid, Kevin

Mar-9-2024–arXiv.org Artificial Intelligence

The problems of online hate speech and cyberbullying have significantly worsened since the increase in popularity of social media platforms such as YouTube and Twitter (X). Natural Language Processing (NLP) techniques have proven to provide a great advantage in automatic filtering such toxic content. Women are disproportionately more likely to be victims of online abuse. However, there appears to be a lack of studies that tackle misogyny detection in under-resourced languages. In this short paper, we present a novel dataset of YouTube comments in mix-code Hinglish collected from YouTube videos which have been weak labelled as `Misogynistic' and `Non-misogynistic'. Pre-processing and Exploratory Data Analysis (EDA) techniques have been applied on the dataset to gain insights on its characteristics. The process has provided a better understanding of the dataset through sentiment scores, word clouds, etc.

dataset, detection, misogyny detection, (15 more...)

arXiv.org Artificial Intelligence

Mar-9-2024

arXiv.org PDF

Add feedback

Country:
- Europe > Ireland (0.05)
- Asia > India (0.05)

Genre:
- Research Report (1.00)

Industry:
- Law Enforcement & Public Safety > Crime Prevention & Enforcement (0.71)
- Information Technology > Security & Privacy (0.55)
- Media > News (0.47)

Technology:
- Information Technology
  - Data Science (1.00)
  - Communications > Social Media (1.00)
  - Artificial Intelligence
    - Natural Language (1.00)
    - Machine Learning > Statistical Learning (0.47)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found