Robotic Environmental State Recognition with Pre-Trained Vision-Language Models and Black-Box Optimization