RefactorBench: Evaluating Stateful Reasoning in Language Agents Through Code

Open in new window