Failures in Perspective-taking of Multimodal AI Systems