ANIM-400K: A Large-Scale Dataset for Automated End-To-End Dubbing of Video

Cai, Kevin, Liu, Chonghua, Chan, David M.

Jan-10-2024–arXiv.org Artificial Intelligence

The Internet's wealth of content, with up to 60% published in English, starkly contrasts the global population, where only 18.8% are English speakers, and just 5.1% consider it their native language, leading to disparities in online information access. Unfortunately, automated processes for dubbing of video - replacing the audio track of a video with a translated alternative - remains a complex and challenging task due to pipelines, necessitating precise timing, facial movement synchronization, and prosody matching. While end-to-end dubbing offers a solution, data scarcity continues to impede the progress of both end-to-end and pipeline-based methods. In this work, we introduce Anim-400K, a comprehensive dataset of over 425K aligned animated video segments in Japanese and English supporting various video-related tasks, including automated dubbing, simultaneous translation, guided video summarization, and genre/theme/style classification. Our dataset is made publicly available for research purposes at https://github.com/davidmchan/Anim400K.

anim-400k, dataset, video, (17 more...)

arXiv.org Artificial Intelligence

Jan-10-2024

arXiv.org PDF

Add feedback

Country:
- North America > United States > California > Alameda County > Berkeley (0.04)

Genre:
- Research Report (0.64)

Technology:
- Information Technology > Artificial Intelligence
  - Natural Language > Machine Translation (1.00)
  - Speech (0.95)
  - Machine Learning (0.68)