Can Language Models Perform Robust Reasoning in Chain-of-thought Prompting with Noisy Rationales?
–Neural Information Processing Systems
This paper investigates an under-explored challenge in large language models (LLMs): chain-of-thought prompting with noisy rationales, which include irrelevant or inaccurate reasoning thoughts within examples used for in-context learning. We construct NoRa dataset that is tailored to evaluate the robustness of reasoning in the presence of noisy rationales. Our findings on NoRa dataset reveal a prevalent vulnerability to such noise among current LLMs, with existing robust methods like self-correction and self-consistency showing limited efficacy. Notably, compared to prompting with clean rationales, GPT-3.5 drops by 1.4%-19.8% in accuracy with irrelevant thoughts and more drastically by 2.2%-40.4% with inaccurate thoughts. Addressing this challenge necessitates external supervision that should be accessible in practice. Here, we propose the method of contrastive denoising with noisy chain-of-thought (CD-CoT). It enhances LLMs' denoising-reasoning capabilities by contrasting noisy rationales with only one clean rationale, which can be the minimal requirement for denoising-purpose prompting. This method follows a principle of exploration and exploitation: (1) rephrasing and selecting rationales in the input space to achieve explicit denoising and (2) exploring diverse reasoning paths and voting on answers in the output space. Empirically, CD-CoT demonstrates an average improvement of 17.8% in accuracy over the base model and shows significantly stronger denoising capabilities than baseline methods.
Neural Information Processing Systems
Mar-27-2025, 12:01:09 GMT
- Country:
- Asia > China
- Hubei Province (0.13)
- North America > United States
- Massachusetts (0.13)
- Asia > China
- Genre:
- Overview (1.00)
- Research Report
- Experimental Study (0.92)
- New Finding (1.00)
- Workflow (1.00)
- Industry:
- Health & Medicine > Therapeutic Area (0.67)
- Information Technology > Security & Privacy (0.67)
- Leisure & Entertainment > Games (1.00)