Enabling Classifiers to Make Judgements Explicitly Aligned with Human Values

Bang, Yejin, Yu, Tiezheng, Madotto, Andrea, Lin, Zhaojiang, Diab, Mona, Fung, Pascale

Oct-14-2022–arXiv.org Artificial Intelligence

Many NLP classification tasks, such as sexism/racism detection or toxicity detection, are based on human values. Yet, human values can vary under diverse cultural conditions. Therefore, we introduce a framework for value-aligned classification that performs prediction based on explicitly written human values in the command. Along with the task, we propose a practical approach that distills value-aligned knowledge from large-scale language models (LLMs) to construct value-aligned classifiers in two steps. First, we generate value-aligned training data from LLMs by prompt-based few-shot learning. Next, we fine-tune smaller classification models with the generated data for the task. Empirical results show that our VA-Models surpass multiple baselines by at least 15.56% on the F1-score, including few-shot learning with OPT-175B and existing text augmentation methods. We suggest that using classifiers with explicit human value input improves both inclusivity & explainability in AI.

category, large language model, machine learning, (15 more...)

arXiv.org Artificial Intelligence

Oct-14-2022

arXiv.org PDF

Add feedback

Country:
- North America
  - Dominican Republic (0.04)
  - United States
    - Virginia (0.04)
    - New York > New York County
      - New York City (0.04)
- Europe > Italy
  - Calabria > Catanzaro Province > Catanzaro (0.04)
- Asia > China
  - Hong Kong (0.04)

Genre:
- Research Report
  - New Finding (0.66)
  - Experimental Study (0.46)

Industry:
- Health & Medicine (0.93)
- Law (0.88)
- Law Enforcement & Public Safety > Crime Prevention & Enforcement (0.46)

Technology:
- Information Technology > Artificial Intelligence
  - Natural Language > Large Language Model (1.00)
  - Machine Learning (1.00)
  - Issues > Social & Ethical Issues (0.93)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found