Author Response for PHYRE: A New Benchmark for Physical Reasoning new environment for benchmarking aspects of physical reasoning in which agents are challenged to solve 2D physics
–Neural Information Processing Systems
We thank the reviewers for their detailed and constructive comments. Overall, the reviewers were positive about this contribution and liked the submission: "I generally The task is compelling and the benchmark is well thought out." The reviewers also raised concerns, which we will address next. For example, in CLEVR it now seems likely that some models (e.g., Relation Networks) have found shortcut "cheats" It is difficult to characterize what constitutes "intrinsic" difficulty, but by As a whole, the community must "go for recall" since By releasing PHYRE to the public, we hope to see rapid exploration of these good suggestions. We will attempt to improve the clarity.
Neural Information Processing Systems
May-31-2025, 15:12:56 GMT
- Technology: