Evaluating the Role of Verifiers in Test-Time Scaling for Legal Reasoning Tasks

Romano, Davide, Schwarz, Jonathan, Giofré, Daniele

Oct-31-2025–arXiv.org Artificial Intelligence

Test-time scaling (TTS) techniques can improve the performance of large language models (LLMs) at the expense of additional computation and latency. While TTS has proven effective in formal domains such as mathematics and programming, its value in argumentative domains such as law remains underexplored. We present an empirical study of verifier-based TTS methods for legal multiple-choice QA (MCQA) across five benchmarks. Using a family of 7 reward models, we evaluate both outcome-level (Best-of-$N$) and process-level (tree search) verification under realistic low-$N$ budgets. Our analysis systematically investigates how verifier utility is affected by key properties such as domain specialization, model size, and supervision type (process-supervised PRMs vs. outcome-only ORMs), even when applied across different roles.

arxiv preprint arxiv, large language model, machine learning, (16 more...)

arXiv.org Artificial Intelligence

Oct-31-2025

arXiv.org PDF

Add feedback

Genre:
- Research Report > New Finding (0.68)

Industry:
- Law (1.00)

Technology:
- Information Technology > Artificial Intelligence
  - Natural Language > Large Language Model (1.00)
  - Machine Learning > Neural Networks
    - Deep Learning (0.32)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found