DATE-LM: Benchmarking Data Attribution Evaluation for Large Language Models