When Names Disappear: Revealing What LLMs Actually Understand About Code

Le, Cuong Chi, Pham, Minh V. T., Van, Cuong Duc, Phan, Hoang N., Phan, Huy N., Nguyen, Tien N.

arXiv.org Artificial Intelligence 

Large Language Models (LLMs) achieve strong results on code tasks, but how they derive program meaning remains unclear. We argue that code communicates through two channels: structural semantics, which define formal behavior, and human-interpretable naming, which conveys intent. Surprisingly, we also observe consistent reductions on execution tasks that should depend only on structure, revealing that current benchmarks reward memorization of naming patterns rather than genuine semantic reasoning. To disentangle these effects, we introduce a suite of semantics-preserving obfuscations and show that they expose identifier leakage across both summarization and execution. Large language models (LLMs) now achieve striking results across code intelligence--program synthesis, repair, summarization, and test generation. Y et how these models derive meaning from source code remains unclear. If an LLM truly understands a program's intent, its behavior should remain stable when human-interpretable names are perturbed while semantics stay fixed; conversely, strong performance drops would indicate an overreliance on surface cues rather than semantic reasoning.