A Novel Framework for Multi-Modal Protein Representation Learning