Elucidating the Design Space of Multimodal Protein Language Models