phyre
PHYRE: A New Benchmark for Physical Reasoning
Understanding and reasoning about physics is an important ability of intelligent agents. We develop the PHYRE benchmark for physical reasoning that contains a set of simple classical mechanics puzzles in a 2D physical environment. The benchmark is designed to encourage the development of learning algorithms that are sample-efficient and generalize well across puzzles. We test several modern learning algorithms on PHYRE and find that these algorithms fall short in solving the puzzles efficiently. We expect that PHYRE will encourage the development of novel sample-efficient agents that learn efficient but useful models of physics. For code and to play PHYRE for yourself, please visit https://player.phyre.ai.
new environment for benchmarking aspects of physical reasoning in which agents are challenged to solve 2D physics
We thank the reviewers for their detailed and constructive comments. Overall, the reviewers were positive about this contribution and liked the submission: " I generally The task is compelling and the benchmark is well thought out. " [R1]; " I like this paper, as it presents " [R2]; " The benchmark is designed to encourage physical The reviewers also raised concerns, which we will address next. For example, in CLEVR it now seems likely that some models (e.g., Relation Networks) have found shortcut "cheats" It is difficult to characterize what constitutes "intrinsic" difficulty, but by As a whole, the community must "go for recall" since By releasing PHYRE to the public, we hope to see rapid exploration of these good suggestions. We will attempt to improve the clarity.
Reviews: PHYRE: A New Benchmark for Physical Reasoning
The authors introduce a new game-style benchmark for physical reasoning, PHYRE, which contains a set of puzzles in a 2D physical environment using a set of parameterized task templates and variations on each template. The paper also presents baseline agents based on a non-parametric memorization strategy, DQN, and online learning variants of these agents. Reviewers are concerned that there is not enough visual complexity (shapes, textures, etc.), that the domain of physical reasoning is quite limited, and that the evaluations can be improved with more rigorous baselines. Although two reviewers see the work as marginally below threshold, all reviewers think an "accept" is reasonable.
PHYRE: A New Benchmark for Physical Reasoning
Understanding and reasoning about physics is an important ability of intelligent agents. We develop the PHYRE benchmark for physical reasoning that contains a set of simple classical mechanics puzzles in a 2D physical environment. The benchmark is designed to encourage the development of learning algorithms that are sample-efficient and generalize well across puzzles. We test several modern learning algorithms on PHYRE and find that these algorithms fall short in solving the puzzles efficiently. We expect that PHYRE will encourage the development of novel sample-efficient agents that learn efficient but useful models of physics.
Forward Prediction for Physical Reasoning
Girdhar, Rohit, Gustafson, Laura, Adcock, Aaron, van der Maaten, Laurens
Physical reasoning requires forward prediction: the ability to forecast what will happen next given some initial world state. We study the performance of state-of-the-art forward-prediction models in complex physical-reasoning tasks. We do so by incorporating models that operate on object or pixel-based representations of the world, into simple physical-reasoning agents. We find that forward-prediction models improve the performance of physical-reasoning agents, particularly on complex tasks that involve many objects. However, we also find that these improvements are contingent on the training tasks being similar to the test tasks, and that generalization to different tasks is more challenging. Surprisingly, we observe that forward predictors with better pixel accuracy do not necessarily lead to better physical-reasoning performance. Nevertheless, our best models set a new state-of-the-art on the PHYRE benchmark for physical reasoning.
PHYRE: A New Benchmark for Physical Reasoning
Bakhtin, Anton, Maaten, Laurens van der, Johnson, Justin, Gustafson, Laura, Girshick, Ross
Understanding and reasoning about physics is an important ability of intelligent agents. We develop the PHYRE benchmark for physical reasoning that contains a set of simple classical mechanics puzzles in a 2D physical environment. The benchmark is designed to encourage the development of learning algorithms that are sample-efficient and generalize well across puzzles. We test several modern learning algorithms on PHYRE and find that these algorithms fall short in solving the puzzles efficiently. We expect that PHYRE will encourage the development of novel sample-efficient agents that learn efficient but useful models of physics.
Why Setting A Benchmark For Physical Reasoning In AI Matters
The machines of the modern world can now be taught how to learn, adapt and improvise with great tact. Asking a robot to run, do a cartwheel or throw a pitch would have sounded like a chapter from a generic sci-fi novel a few years ago. But now with the advancements in hardware acceleration and the optimisation of machine learning algorithms, techniques like Reinforcement Learning are being put into practical use. Hard coding a robot to perform even mundane skills poorly will take a lot of computational heavy lifting. However, it takes some ingenious constraint assumption to make the robot perform decently when put under unstructured, real-world situations.