SWE-bench: Can Language Models Resolve Real-World GitHub Issues?

Open in new window