6801fa3fd290229efc490ee0cf1c5687-Paper-Conference.pdf

May-29-2025, 20:22:55 GMT–Neural Information Processing Systems

Large Language models (LLMs) have demonstrated supreme capabilities in textual understanding and generation, but cannot be directly applied to cross-modal tasks without fine-tuning. This paper proposes a cross-modal in-context learning approach, empowering the frozen LLMs to achieve multiple audio tasks in a few-shot style without any parameter update. Specifically, we propose a novel LLM-driven audio codec model, LLM-Codec, which transfers the audio modality into textual space by representing audio tokens with words or sub-words from the LLM vocabulary, while maintaining high audio reconstruction quality. The key idea is to reduce the modality heterogeneity between text and audio by compressing the audio modality into the well-trained textual space of LLMs. Thus, the audio representation can be viewed as a new foreign language, and LLMs can learn the new foreign language with several demonstrations. In experiments, we investigate the performance of the proposed approach across multiple audio understanding and generation tasks, e.g.

artificial intelligence, large language model, natural language, (16 more...)

Neural Information Processing Systems

May-29-2025, 20:22:55 GMT

Conferences PDF

Add feedback

Country:
- Asia (0.14)
- Europe (0.14)

Genre:
- Research Report > Experimental Study (0.93)

Industry:
- Education (0.34)
- Information Technology (0.46)
- Media (0.46)

Technology:
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Duplicate Docs Excel Report

Title
6801fa3fd290229efc490ee0cf1c5687-Paper-Conference.pdf

Similar Docs Excel Report more

Title	Similarity	Source
None found