RExBench: Can coding agents autonomously implement AI research extensions?

Open in new window