The Illusion of Procedural Reasoning: Measuring Long-Horizon FSM Execution in LLMs

Open in new window