LMR-BENCH: Evaluating LLM Agent's Ability on Reproducing Language Modeling Research