Toward Socially Aware Vision-Language Models: Evaluating Cultural Competence Through Multimodal Story Generation