Evaluating and Improving Tool-Augmented Computation-Intensive Math Reasoning

Dec-25-2025, 01:04:05 GMT–Neural Information Processing Systems

Chain-of-thought prompting (CoT) and tool augmentation have been validated in recent work as effective practices for improving large language models (LLMs) to perform step-by-step reasoning on complex math-related tasks.However, most existing math reasoning datasets may not be able to fully evaluate and analyze the ability of LLMs in manipulating tools and performing reasoning, as they often only require very few invocations of tools or miss annotations for evaluating intermediate reasoning steps, thus supporting only outcome evaluation.To address the issue, we construct CARP

name change, proceedings, tool-augmented computation-intensive math reasoning, (9 more...)

Neural Information Processing Systems

Dec-25-2025, 01:04:05 GMT

Conferences Web Page

Add feedback

Technology:
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.84)