Can Large Vision-Language Models Understand Multimodal Sarcasm?