A Virtual Environment for Developing and Evaluating Automated Scientific Discovery Agents

Neural Information Processing Systems 

A benchmark 2. Benchmark: For benchmarks, the supplementary materials must ensure that all results are easily reproducible (i.e.