Inferring Dynamic Physical Properties from Video Foundation Models