Planning with Vision-Language Models and a Use Case in Robot-Assisted Teaching