Comparative Multi-View Language Grounding