Reasoning over Boundaries: Enhancing Specification Alignment via Test-time Deliberation

Zhang, Haoran, Li, Yafu, Hu, Xuyang, Liu, Dongrui, Wang, Zhilin, Li, Bo, Cheng, Yu

Oct-7-2025–arXiv.org Artificial Intelligence

Large language models (LLMs) are increasingly applied in diverse real-world scenarios, each governed by bespoke behavioral and safety specifications (spec) custom-tailored by users or organizations. These spec, categorized into safety-spec and behavioral-spec, vary across scenarios and evolve with changing preferences and requirements. We formalize this challenge as specification alignment, focusing on LLMs' ability to follow dynamic, scenario-specific spec from both behavioral and safety perspectives. To address this challenge, we propose Align3, a lightweight method that employs Test-Time Deliberation (TTD) with hierarchical reflection and revision to reason over the specification boundaries. We further present SpecBench, a unified benchmark for measuring specification alignment, covering 5 scenarios, 103 spec, and 1,500 prompts. Experiments on 15 reasoning and 18 instruct models with several TTD methods, including Self-Refine, TPO, and MoreThink, yield three key findings: (i) test-time deliberation enhances specification alignment; (ii) Align3 advances the safety-helpfulness trade-off frontier with minimal overhead; (iii) SpecBench effectively reveals alignment gaps. These results highlight the potential of test-time deliberation as an effective strategy for reasoning over the real-world specification boundaries.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

Oct-7-2025

arXiv.org PDF

Add feedback

Country:
- Asia > China (0.28)
- North America > United States (0.27)

Genre:
- Research Report > New Finding (0.92)

Industry:
- Law Enforcement & Public Safety > Crime Prevention & Enforcement (1.00)
- Law > Criminal Law (1.00)
- Information Technology > Security & Privacy (1.00)
- Government (1.00)
- Education (0.67)
- Health & Medicine
  - Therapeutic Area > Psychiatry/Psychology (1.00)
  - Pharmaceuticals & Biotechnology (1.00)
  - Consumer Health (1.00)

Technology:
- Information Technology > Artificial Intelligence
  - Natural Language
    - Large Language Model (1.00)
    - Chatbot (1.00)
  - Machine Learning > Neural Networks
    - Deep Learning (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found