Towards Interpretable Sleep Stage Classification Using Cross-Modal Transformers