The Illusion of Procedural Reasoning: Measuring Long-Horizon FSM Execution in LLMs