What I cannot execute, I do not understand: Training and Evaluating LLMs on Program Execution Traces

Open in new window