Do LLMs "know" internally when they follow instructions?

Heo, Juyeon, Heinze-Deml, Christina, Elachqar, Oussama, Ren, Shirley, Nallasamy, Udhay, Miller, Andy, Chan, Kwan Ho Ryan, Narain, Jaya

Oct-30-2024–arXiv.org Artificial Intelligence

Instruction-following is crucial for building AI agents with large language models (LLMs), as these models must adhere strictly to user-provided constraints and guidelines. However, LLMs often fail to follow even simple and clear instructions. To improve instruction-following behavior and prevent undesirable outputs, a deeper understanding of how LLMs' internal states relate to these outcomes is required. Our analysis of LLM internal states reveal a dimension in the input embedding space linked to successful instruction-following. We demonstrate that modifying representations along this dimension improves instruction-following success rates compared to random changes, without compromising response quality. Further investigation reveals that this dimension is more closely related to the phrasing of prompts rather than the inherent difficulty of the task or instructions. This discovery also suggests explanations for why LLMs sometimes fail to follow clear instructions and why prompt engineering is often effective, even when the content remains largely unchanged. This work provides insight into the internal workings of LLMs' instruction-following, paving the way for reliable LLM agents. Given the potential of large language models (LLMs), there has been significant interest in utilizing these models to build personal AI agents. For instance, one could imagine deploying an LLM as a personal healthcare assistant, such as a fitness or nutrition planner, or for psychological counseling (Li et al., 2024b; Wang et al., 2023; Tu et al., 2024). Compared to traditional machine learningbased AI agents, LLMs offer the advantage of being easily adaptable through prompting, allowing users to provide guidelines and personal information without the need to retrain model weights. Instruction-following is critical in the development of personal AI agents with LLMs through prompts because these models must adhere to the constraints and guidelines to ensure safe and trustworthy interactions. For example, suppose an LLM is building a personal fitness plan for a user with knee problems.

instruction type, large language model, natural language, (14 more...)

arXiv.org Artificial Intelligence

Oct-30-2024

arXiv.org PDF

Add feedback

Country:
- Europe > United Kingdom > Scotland (0.14)

Genre:
- Research Report > New Finding (0.68)

Industry:
- Health & Medicine (1.00)
- Media > Film (0.46)

Technology:
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)