Why Setting A Benchmark For Physical Reasoning In AI Matters