Image-based Geo-localization for Robotics: Are Black-box Vision-Language Models there yet?

Open in new window