Probing the effectiveness of World Models for Spatial Reasoning through Test-time Scaling