PejorativITy: Disambiguating Pejorative Epithets to Improve Misogyny Detection in Italian Tweets

Muti, Arianna, Ruggeri, Federico, Toraman, Cagri, Musetti, Lorenzo, Algherini, Samuel, Ronchi, Silvia, Saretto, Gianmarco, Zapparoli, Caterina, Barrón-Cedeño, Alberto

Apr-3-2024–arXiv.org Artificial Intelligence

Misogyny is often expressed through figurative language. Some neutral words can assume a negative connotation when functioning as pejorative epithets. Disambiguating the meaning of such terms might help the detection of misogyny. In order to address such task, we present PejorativITy, a novel corpus of 1,200 manually annotated Italian tweets for pejorative language at the word level and misogyny at the sentence level. We evaluate the impact of injecting information about disambiguated words into a model targeting misogyny detection. In particular, we explore two different approaches for injection: concatenation of pejorative information and substitution of ambiguous words with univocal terms. Our experimental results, both on our corpus and on two popular benchmarks on Italian tweets, show that both approaches lead to a major classification improvement, indicating that word sense disambiguation is a promising preliminary step for misogyny detection. Furthermore, we investigate LLMs' understanding of pejorative epithets by means of contextual word embeddings analysis and prompting.

disambiguation, misogyny detection, tweet, (12 more...)

arXiv.org Artificial Intelligence

Apr-3-2024

arXiv.org PDF

Add feedback

Country:
- Pacific Ocean (0.04)
- North America
  - Dominican Republic (0.04)
  - United States
    - Washington > King County
      - Seattle (0.04)
    - Minnesota > Hennepin County
      - Minneapolis (0.04)
  - Canada
    - Ontario > Toronto (0.04)
    - British Columbia > Metro Vancouver Regional District
      - Vancouver (0.04)
- Europe
  - Italy
    - Apulia > Bari (0.04)
    - Piedmont > Turin Province
      - Turin (0.04)
    - Emilia-Romagna
      - Metropolitan City of Bologna > Bologna (0.04)
      - Modeno Province > Modena (0.04)
  - France > Provence-Alpes-Côte d'Azur
    - Bouches-du-Rhône > Marseille (0.04)
  - Belgium > Brussels-Capital Region
    - Brussels (0.04)
- Asia
  - China > Hong Kong (0.04)
  - Middle East > Republic of Türkiye
    - Ankara Province > Ankara (0.04)

Genre:
- Research Report > New Finding (0.68)

Industry:
- Information Technology > Services (0.46)

Technology:
- Information Technology > Artificial Intelligence
  - Natural Language > Large Language Model (1.00)
  - Machine Learning
    - Neural Networks > Deep Learning (0.68)
    - Performance Analysis > Accuracy (0.46)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found