Automated Benchmark Generation for Repository-Level Coding Tasks