Multi-modal Representation Learning Enables Accurate Protein Function Prediction in Low-Data Setting