Pronunciation Assessment with Multi-modal Large Language Models