Reasoning's Razor: Reasoning Improves Accuracy but Can Hurt Recall at Critical Operating Points in Safety and Hallucination Detection

Chegini, Atoosa, Kazemi, Hamid, Souza, Garrett, Safi, Maria, Song, Yang, Bengio, Samy, Williamson, Sinead, Farajtabar, Mehrdad

Oct-27-2025–arXiv.org Artificial Intelligence

In precision-sensitive classification tasks, false positives carry severe operational consequences. For example, when a text safety classifier incorrectly flags 10% of benign user queries as unsafe, it blocks legitimate queries from being processed, degrading the experience for millions of users and potentially driving them away from the service. Similarly, in hallucination detection within Retrieval-Augmented Generation (RAG) pipelines, when factually correct responses are incorrectly flagged as hallucinated, the system triggers regeneration or self-correction mechanisms, adding unnecessary computational overhead and latency that frustrates users waiting for responses. These deployment realities demand classifiers that operate at extremely low false positive rates--often below 1%--while maintaining acceptable recall. Large language models are increasingly deployed for such precision-critical classification tasks through specialized safety guardrails like Llama Guard (Inan et al., 2023) and ShieldGemma (Zeng et al., 2024), as well as hallucination detection systems (Huang et al., 2025). Recently, reasoning-augmented approaches have emerged as a promising direction: GuardReasoner (Liu et al., 2025) incorporates chain-of-thought reasoning for safety classification, while Lynx (Ravi et al., 2024) leverages reasoning for hallucination detection in RAG

classification, large language model, machine learning, (18 more...)

arXiv.org Artificial Intelligence

Oct-27-2025

arXiv.org PDF

Add feedback

Genre:
- Research Report > New Finding (0.93)

Technology:
- Information Technology > Artificial Intelligence
  - Natural Language
    - Large Language Model (1.00)
    - Chatbot (0.90)
  - Machine Learning
    - Performance Analysis > Accuracy (1.00)
    - Neural Networks > Deep Learning (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found