Context-aware Neural Machine Translation for English-Japanese Business Scene Dialogues
Honda, Sumire, Fernandes, Patrick, Zerva, Chrysoula
–arXiv.org Artificial Intelligence
Despite the remarkable advancements in machine translation, the current sentence-level paradigm faces challenges when dealing with highly-contextual languages like Japanese. In this paper, we explore how context-awareness can improve the performance of the current Neural Machine Translation (NMT) models for English-Japanese business dialogues translation, and what kind of context provides meaningful information to improve translation. As business dialogue involves complex discourse phenomena but offers scarce training resources, we adapted a pretrained mBART model, finetuning on multi-sentence dialogue data, which allows us to experiment with different contexts. We investigate the impact of larger context sizes and propose novel context tokens encoding extra-sentential information, such as speaker turn and scene type. We make use of Conditional Cross-Mutual Information (CXMI) to explore how much of the context the model uses and generalise CXMI to study the impact of the extra-sentential context. Overall, we find that models leverage both preceding sentences and extra-sentential context (with CXMI increasing with context size) and we provide a more focused analysis on honorifics translation. Regarding translation quality, increased source-side context paired with scene and speaker information improves the model performance compared to previous work and our context-agnostic baselines, measured in BLEU and COMET metrics.
arXiv.org Artificial Intelligence
Nov-20-2023
- Country:
- Oceania > Australia
- North America > United States
- Pennsylvania
- Philadelphia County > Philadelphia (0.04)
- Allegheny County > Pittsburgh (0.04)
- Louisiana > Orleans Parish
- New Orleans (0.04)
- Pennsylvania
- Europe
- Spain > Catalonia
- Barcelona Province > Barcelona (0.04)
- Portugal > Lisbon
- Lisbon (0.04)
- Germany > Brandenburg
- Potsdam (0.04)
- France > Provence-Alpes-Côte d'Azur
- Bouches-du-Rhône > Marseille (0.04)
- Denmark > Capital Region
- Copenhagen (0.04)
- Spain > Catalonia
- Asia > China
- Hong Kong (0.04)
- Genre:
- Research Report (0.64)
- Technology: