Self-Powered LLM Modality Expansion for Large Speech-Text Models