Abusive Span Detection for Vietnamese Narrative Texts

Nguyen, Nhu-Thanh, Phan, Khoa Thi-Kim, Nguyen, Duc-Vu, Nguyen, Ngan Luu-Thuy

Dec-12-2023–arXiv.org Artificial Intelligence

Abuse in its various forms, including physical, psychological, verbal, sexual, financial, and cultural, has a negative impact on mental health. However, there are limited studies on applying natural language processing (NLP) in this field in Vietnam. Therefore, we aim to contribute by building a human-annotated Vietnamese dataset for detecting abusive content in Vietnamese narrative texts. We sourced these texts from VnExpress, Vietnam's popular online newspaper, where readers often share stories containing abusive content. Identifying and categorizing abusive spans in these texts posed significant challenges during dataset creation, but it also motivated our research. We experimented with lightweight baseline models by freezing PhoBERT and XLM-RoBERTa and using their hidden states in a BiLSTM to assess the complexity of the dataset. According to our experimental results, PhoBERT outperforms other models in both labeled and unlabeled abusive span detection tasks. These results indicate that it has the potential for future improvements.

dataset, detection, span detection, (11 more...)

arXiv.org Artificial Intelligence

Dec-12-2023

arXiv.org PDF

Add feedback

Country:
- North America > United States
  - New York > New York County
    - New York City (0.04)
  - New Mexico > Santa Fe County
    - Santa Fe (0.04)
  - California
    - San Francisco County > San Francisco (0.14)
    - San Diego County > San Diego (0.04)
- Europe
  - Spain (0.04)
  - Greece (0.04)
  - Croatia (0.04)
  - France > Provence-Alpes-Côte d'Azur
    - Bouches-du-Rhône > Marseille (0.04)
- Asia
  - Vietnam > Hồ Chí Minh City
    - Hồ Chí Minh City (0.05)
  - Malaysia > Kuala Lumpur
    - Kuala Lumpur (0.04)
  - China > Shanghai
    - Shanghai (0.04)

Genre:
- Research Report > New Finding (0.68)

Industry:
- Law Enforcement & Public Safety > Crime Prevention & Enforcement (0.68)
- Media > News (0.54)
- Health & Medicine > Therapeutic Area
  - Psychiatry/Psychology (0.66)

Technology:
- Information Technology > Artificial Intelligence
  - Natural Language (1.00)
  - Machine Learning > Neural Networks
    - Deep Learning (0.68)