Evaluating Vision-Language Models for Emotion Recognition