cantnlp@LT-EDI-2023: Homophobia/Transphobia Detection in Social Media Comments using Spatio-Temporally Retrained Language Models

Wong, Sidney G. -J., Durward, Matthew, Adams, Benjamin, Dunn, Jonathan

Aug-24-2023–arXiv.org Artificial Intelligence

This paper describes our multiclass classification system developed as part of the LTEDI@RANLP-2023 shared task. We used a BERT-based language model to detect homophobic and transphobic content in social media comments across five language conditions: English, Spanish, Hindi, Malayalam, and Tamil. We retrained a transformer-based crosslanguage pretrained language model, XLMRoBERTa, with spatially and temporally relevant social media language data. We also retrained a subset of models with simulated script-mixed social media language data with varied performance. We developed the best performing seven-label classification system for Malayalam based on weighted macro averaged F1 score (ranked first out of six) with variable performance for other language and class-label conditions. We found the inclusion of this spatio-temporal data improved the classification performance for all language and task conditions when compared with the baseline. The results suggests that transformer-based language classification systems are sensitive to register-specific and language-specific retraining.

language condition, large language model, machine learning, (18 more...)

arXiv.org Artificial Intelligence

Aug-24-2023

arXiv.org PDF

Add feedback

Country:
- Oceania > New Zealand
  - South Island > Canterbury Region > Christchurch (0.04)
- Europe > Ireland
  - Leinster > County Dublin > Dublin (0.05)
- Asia
  - Middle East > Jordan (0.04)
  - India (0.04)

Genre:
- Research Report (0.84)

Technology:
- Information Technology
  - Communications > Social Media (1.00)
  - Artificial Intelligence
    - Natural Language > Large Language Model (0.56)
    - Machine Learning > Neural Networks
      - Deep Learning (0.70)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found