Image-based Geo-localization for Robotics: Are Black-box Vision-Language Models there yet?