Optimizing LLM Code Suggestions: Feedback-Driven Timing with Lightweight State Bounds
Awad, Mohammad Nour Al, Ivanov, Sergey, Tikhonova, Olga
–arXiv.org Artificial Intelligence
Abstract--Large Language Models (LLMs) have transformed code auto-completion by generating context-aware suggestions. Y et, deciding when to present these suggestions remains under-explored, often leading to interruptions or wasted inference calls. We propose an adaptive timing mechanism that dynamically adjusts the delay before offering a suggestion based on real-time developer feedback. Our suggested method combines a logistic transform of recent acceptance rates with a bounded delay range, anchored by a high-level binary prediction of the developer's cognitive state. In a two-month deployment with professional developers, our system improved suggestion acceptance from 4.9% with no delay to 15.4% with static delays, and to 18.6% with adaptive timing--while reducing blind rejections (rejections without being read) from 8.3% to 0.36%. T ogether, these improvements increase acceptance and substantially reduce wasted inference calls by 75%, making LLM-based code assistants more efficient and cost-effective in practice. Modern software development increasingly relies on AIpowered code assistants--most prominently LLM-based tools such as GitHub Copilot--which leverage massive pre-trained models to suggest context-aware completions and entire code snippets [1], [2]. These systems aim to boost productivity by reducing boilerplate and aiding API recall. Subsequent work on specialized code models (e.g., CodeBERT and CodeT5) has further improved completion accuracy and relevance [3]-[5]. Despite advances in what content to generate, the timing of suggestion delivery remains an underexplored yet critical factor.
arXiv.org Artificial Intelligence
Nov-25-2025