Learning Multimodal Rewards from Rankings