Bias-Augmented Consistency Training Reduces Biased Reasoning in Chain-of-Thought
Chua, James, Rees, Edward, Batra, Hunar, Bowman, Samuel R., Michael, Julian, Perez, Ethan, Turpin, Miles
–arXiv.org Artificial Intelligence
While chain-of-thought prompting (CoT) has the potential to improve the explainability of language model reasoning, it can systematically misrepresent the factors influencing models' behavior--for example, rationalizing answers in line with a user's opinion without mentioning this bias. To mitigate this biased reasoning problem, we introduce bias-augmented consistency training (BCT), an unsupervised fine-tuning scheme that trains models to give consistent reasoning across prompts with and without biasing features. We construct a suite testing nine forms of biased reasoning on seven question-answering tasks, and find that applying BCT to GPT-3.5-Turbo with one bias reduces the rate of biased reasoning by 86% on held-out tasks. Moreover, this model generalizes to other forms of bias, reducing biased reasoning on held-out biases by an average of 37%. As BCT generalizes to held-out biases and does not require gold labels, this method may hold promise for reducing biased reasoning from as-of-yet unknown biases and on tasks where supervision for ground truth reasoning is unavailable.
arXiv.org Artificial Intelligence
Mar-8-2024
- Country:
- South America > Colombia
- Meta Department > Villavicencio (0.04)
- North America
- United States (0.46)
- Canada > Ontario
- Toronto (0.04)
- Europe
- United Kingdom > England
- Oxfordshire > Oxford (0.04)
- Switzerland > Basel-City
- Basel (0.04)
- Ireland > Leinster
- County Dublin > Dublin (0.04)
- Belgium > Brussels-Capital Region
- Brussels (0.04)
- United Kingdom > England
- Asia
- Indonesia > Bali (0.04)
- Singapore (0.04)
- Middle East
- Jordan (0.04)
- UAE > Abu Dhabi Emirate
- Abu Dhabi (0.04)
- Japan > Honshū
- Kantō > Kanagawa Prefecture > Yokohama (0.04)
- South America > Colombia
- Genre:
- Research Report > New Finding (0.67)
- Industry:
- Law (1.00)
- Government (1.00)
- Leisure & Entertainment > Sports (0.69)
- Education (0.67)
- Technology: