Benchmarking Vision Language Models for Cultural Understanding