MUTEX: Learning Unified Policies from Multimodal Task Specifications