Neural Machine Translation Data Generation and Augmentation using ChatGPT
–arXiv.org Artificial Intelligence
Neural models have revolutionized the field of machine translation, but creating parallel corpora is expensive and time-consuming. We investigate an alternative to manual parallel corpora - hallucinated parallel corpora created by generative language models. Although these models are themselves trained on parallel data, they can leverage a multilingual vector space to create data, and may be able to supplement small manually-procured corpora. Our experiments highlight two key findings - despite a lack of diversity in their output, the hallucinated data improves the translation signal, even when the domain clashes with the original dataset.
arXiv.org Artificial Intelligence
Jul-11-2023
- Country:
- North America > Canada
- British Columbia (0.05)
- Europe
- Germany > Berlin (0.05)
- Italy > Tuscany
- Florence (0.04)
- France > Provence-Alpes-Côte d'Azur
- Bouches-du-Rhône > Marseille (0.04)
- Belgium > Brussels-Capital Region
- Brussels (0.04)
- North America > Canada
- Genre:
- Research Report > New Finding (0.68)
- Technology: