Too Big to Fool: Resisting Deception in Language Models

Samsami, Mohammad Reza, Richter, Mats Leon, Rodriguez, Juan, Thakkar, Megh, Chandar, Sarath, Gasse, Maxime

Dec-13-2024–arXiv.org Artificial Intelligence

Large language models must balance their weight-encoded knowledge with in-context information from prompts to generate accurate responses. This paper investigates this interplay by analyzing how models of varying capacities within the same family handle intentionally misleading in-context information. Our experiments demonstrate that larger models exhibit higher resilience to deceptive prompts, showcasing an advanced ability to interpret and integrate prompt information with their internal knowledge. Furthermore, we find that larger models outperform smaller ones in following legitimate instructions, indicating that their resilience is not due to disregarding in-context information. We also show that this phenomenon is likely not a result of memorization but stems from the models' ability to better leverage implicit task-relevant information from the prompt alongside their internally stored knowledge.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

Dec-13-2024

arXiv.org PDF

Add feedback

Country:
- South America > Colombia
  - Meta Department > Villavicencio (0.04)
- North America
  - United States
    - New York (0.04)
    - Minnesota > Hennepin County
      - Minneapolis (0.14)
  - Canada > Quebec
    - Montreal (0.04)
- Asia
  - Singapore (0.04)
  - Indonesia > Bali (0.04)
  - Middle East > Jordan (0.04)

Genre:
- Research Report > New Finding (1.00)

Technology:
- Information Technology > Artificial Intelligence
  - Natural Language > Large Language Model (1.00)
  - Machine Learning > Neural Networks
    - Deep Learning (0.47)