Truth is Universal: Robust Detection of Lies in LLMs Lennart Bürger

Mar-27-2025, 15:52:16 GMT–Neural Information Processing Systems

Large Language Models (LLMs) have revolutionised natural language processing, exhibiting impressive human-like capabilities. In particular, LLMs are capable of "lying", knowingly outputting false statements. Hence, it is of interest and importance to develop methods to detect when LLMs lie. Indeed, several authors trained classifiers to detect LLM lies based on their internal model activations. However, other researchers showed that these classifiers may fail to generalise, for example to negated statements.

large language model, machine learning, natural language, (20 more...)

Neural Information Processing Systems

Mar-27-2025, 15:52:16 GMT

Conferences PDF

Add feedback

Country:
- Asia (1.00)
- North America > United States (0.46)

Genre:
- Research Report
  - Experimental Study (1.00)
  - New Finding (0.68)

Industry:
- Banking & Finance (0.67)
- Government (0.46)

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning > Neural Networks
    - Deep Learning (0.67)
  - Natural Language > Large Language Model (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found