Probing Audio-Generation Capabilities of Text-Based Language Models