BigVideo: A Large-scale Video Subtitle Translation Dataset for Multimodal Machine Translation
Kang, Liyan, Huang, Luyang, Peng, Ningxin, Zhu, Peihao, Sun, Zewei, Cheng, Shanbo, Wang, Mingxuan, Huang, Degen, Su, Jinsong
–arXiv.org Artificial Intelligence
The text inputs are often context to understand the world. From the simple and sufficient for translation tasks (Wu perspective of NMT, it is also much needed to et al., 2021). Take the widely used Multi30K as make use of such information to approach humanlevel an example. Multi30K consists of only 30K image translation abilities. To facilitate Multimodal captions, while typical text translation systems are Machine Translation (MMT) research, a number often trained with several million sentence pairs. of datasets have been proposed including imageguided We argue that studying the effects of visual contexts translation datasets (Elliott et al., 2016; in machine translation requires a large-scale Gella et al., 2019; Wang et al., 2022) and videoguided and diverse data set for training and a real-world translation datasets (Sanabria et al., 2018; and complex benchmark for testing.
arXiv.org Artificial Intelligence
Jul-3-2023
- Country:
- Asia
- Taiwan (0.04)
- China
- Fujian Province > Xiamen (0.04)
- Liaoning Province > Dalian (0.04)
- Asia
- Genre:
- Research Report (1.00)
- Industry:
- Information Technology > Security & Privacy (0.46)
- Leisure & Entertainment > Sports (0.46)
- Technology: