Using Vision-Language Models as Proxies for Social Intelligence in Human-Robot Interaction