A Surprising Failure? Multimodal LLMs and the NLVR Challenge