Learning to love diligent trolls: Accounting for rater effects in the dialogue safety task

Oct-30-2023–arXiv.org Artificial Intelligence

Chatbots have the risk of generating offensive utterances, which must be avoided. Post-deployment, one way for a chatbot to continuously improve is to source utterance/label pairs from feedback by live users. However, among users are trolls, who provide training examples with incorrect labels. To de-troll training data, previous work removed training examples that have high user-aggregated cross-validation (CV) error. However, CV is expensive; and in a coordinated attack, CV may be overwhelmed by trolls in number and in consistency among themselves. In the present work, I address both limitations by proposing a solution inspired by methodology in automated essay scoring (AES): have multiple users rate each utterance, then perform latent class analysis (LCA) to infer correct labels. As it does not require GPU computations, LCA is inexpensive. In experiments, I found that the AES-like solution can infer training labels with high accuracy when trolls are consistent, even when trolls are the majority.

corrupt action, troll corrupt rate, troll prevalence, (13 more...)

arXiv.org Artificial Intelligence

Oct-30-2023

arXiv.org PDF

Add feedback

Country:
- North America
  - United States
    - Texas (0.04)
    - New York > New York County
      - New York City (0.04)
  - Canada > Quebec
    - Montreal (0.14)
- Europe
  - France (0.04)
  - Denmark > Capital Region
    - Copenhagen (0.04)
- Asia > China
  - Hong Kong (0.04)

Genre:
- Research Report (0.40)

Industry:
- Health & Medicine > Therapeutic Area
  - Psychiatry/Psychology > Mental Health (0.40)
- Education
  - Assessment & Standards > Student Performance (0.56)
  - Educational Technology > Educational Software
    - Computer-Aided Assessment (0.35)

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning > Inductive Learning (1.00)
  - Natural Language > Chatbot (0.70)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found