LionGuard: Building a Contextualized Moderation Classifier to Tackle Localized Unsafe Content

Jun-24-2024–arXiv.org Artificial Intelligence

As large language models (LLMs) become increasingly prevalent in a wide variety of applications, concerns about the safety of their outputs have become more significant. Most efforts at safety-tuning or moderation today take on a predominantly Western-centric view of safety, especially for toxic, hateful, or violent speech. In this paper, we describe LionGuard, a Singapore-contextualized moderation classifier that can serve as guardrails against unsafe LLM outputs. When assessed on Singlish data, LionGuard outperforms existing widely-used moderation APIs, which are not finetuned for the Singapore context, by 14% (binary) and up to 51% (multi-label). Our work highlights the benefits of localization for moderation classifiers and presents a practical and scalable approach for low-resource languages.

category, classifier, dataset, (15 more...)

arXiv.org Artificial Intelligence

Jun-24-2024

arXiv.org PDF

Add feedback

Country:
- North America
  - United States
    - Texas > Travis County
      - Austin (0.04)
    - New York > New York County
      - New York City (0.04)
    - Minnesota > Hennepin County
      - Minneapolis (0.14)
  - Canada > Ontario
    - Toronto (0.04)
- Europe
  - Ireland (0.04)
  - Belgium > Brussels-Capital Region
    - Brussels (0.04)
- Asia
  - Singapore (0.47)
  - Taiwan (0.04)
  - Japan (0.04)
  - India > Tamil Nadu
    - Chennai (0.04)
  - China > Hubei Province
    - Wuhan (0.04)

Genre:
- Research Report (0.50)

Industry:
- Law (1.00)
- Health & Medicine (0.94)
- Government (0.93)
- Information Technology (0.67)
- Law Enforcement & Public Safety > Crime Prevention & Enforcement (0.46)

Technology:
- Information Technology > Artificial Intelligence
  - Natural Language > Large Language Model (1.00)
  - Machine Learning
    - Neural Networks > Deep Learning (0.96)
    - Performance Analysis > Accuracy (0.67)
    - Learning Graphical Models > Directed Networks
      - Bayesian Learning (0.46)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found