MAVL: A Multilingual Audio-Video Lyrics Dataset for Animated Song Translation

Cho, Woohyun, Kim, Youngmin, Lee, Sunghyun, Yu, Youngjae

Sep-19-2025–arXiv.org Artificial Intelligence

Lyrics translation requires both accurate semantic transfer and preservation of musical rhythm, syllabic structure, and poetic style. In animated musicals, the challenge intensifies due to alignment with visual and auditory cues. We introduce Multilingual Audio-Video Lyrics Benchmark for Animated Song Translation (MAVL), the first multilingual, multimodal benchmark for singable lyrics translation. By integrating text, audio, and video, MAVL enables richer and more expressive translations than text-only approaches. Building on this, we propose Syllable-Constrained Audio-Video LLM with Chain-of-Thought SylAVL-CoT, which leverages audio-video cues and enforces syllabic constraints to produce natural-sounding lyrics. Experimental results demonstrate that SylAVL-CoT significantly outperforms text-based models in singability and contextual accuracy, emphasizing the value of multimodal, multilingual approaches for lyrics translation.

large language model, machine learning, translation, (19 more...)

arXiv.org Artificial Intelligence

Sep-19-2025

arXiv.org PDF

Add feedback

Country:
- North America > United States (0.28)

Genre:
- Research Report > New Finding (0.48)

Industry:
- Leisure & Entertainment (1.00)
- Media
  - Film (0.92)
  - Music (0.67)

Technology:
- Information Technology > Artificial Intelligence
  - Representation & Reasoning (1.00)
  - Natural Language
    - Machine Translation (1.00)
    - Large Language Model (0.91)
  - Machine Learning > Neural Networks
    - Deep Learning (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found