How to Sense the World: Leveraging Hierarchy in Multimodal Perception for Robust Reinforcement Learning Agents