DeepPHY: Benchmarking Agentic VLMs on Physical Reasoning