LMR-BENCH: Evaluating LLM Agent's Ability on Reproducing Language Modeling Research

Open in new window