Don't Take Things Out of Context: Attention Intervention for Enhancing Chain-of-Thought Reasoning in Large Language Models

Yan, Shaotian, Shen, Chen, Wang, Wenxiao, Xie, Liang, Liu, Junjie, Ye, Jieping

arXiv.org Artificial Intelligence 

Few-shot Chain-of-Thought (CoT) significantly enhances the reasoning capabilities of large language models (LLMs), functioning as a whole to guide these models in generating reasoning steps toward final answers. However, we observe that isolated segments, words, or tokens within CoT demonstrations can unexpectedly disrupt the generation process of LLMs. The model may overly concentrate on certain local information present in the demonstration, introducing irrelevant noise into the reasoning process and potentially leading to incorrect answers. In this paper, we investigate the underlying mechanism of CoT through dynamically tracing and manipulating the inner workings of LLMs at each output step, which demonstrates that tokens exhibiting specific attention characteristics are more likely to induce the model to take things out of context; these tokens directly attend to the hidden states tied with prediction, without substantial integration of non-local information. Building upon these insights, we propose a Few-shot Attention Intervention method (FAI) that dynamically analyzes the attention patterns of demonstrations to accurately identify these tokens and subsequently make targeted adjustments to the attention weights to effectively suppress their distracting effect on LLMs. Comprehensive experiments across multiple benchmarks demonstrate consistent improvements over baseline methods, with a remarkable 5.91% improvement on the AQuA dataset, further highlighting the effectiveness of FAI. The most prevalent paradigm of CoT is known as few-shot CoT, which comprises a handful of demonstrations, each consisting of a query paired with a reasoning chain. However, in practice, the performance of LLMs is sensitive to the selection of CoT demonstrations (Huang et al., 2023; Rubin et al., 2021; Luo et al., 2023; Liu et al., 2023; Su et al., 2022). Employing diverse CoT exemplars can cause considerable variations in the overall precision of LLMs. We further demonstrate that even when overall accuracy rates are comparable, varying CoT demonstrations can lead to substantial differences in the distribution of specific questions that are answered correctly versus those answered incorrectly. Yet the underlying cause of the observed performance variations remains largely unclear. Question: Jenn is saving up money to buy a bike. She has 5 jars full of quarters. Each jar can hold 160 quarters. If Question: Agatha has $60 to spend on a new bike. She Question: Mary has 6 jars of sprinkles in her pantry. Answer: Jenn has 5 * 160 = <<5*160=800>>800 quarters. If each pan holds 12 cupcakes, how many Answer: Agatha spends 15+25=<<15+25=40>>40 dollars.