Document Flattening: Beyond Concatenating Context for Document-Level Neural Machine Translation
Wu, Minghao, Foster, George, Qu, Lizhen, Haffari, Gholamreza
–arXiv.org Artificial Intelligence
Existing work in document-level neural machine translation commonly concatenates several consecutive sentences as a pseudo-document, and then learns inter-sentential dependencies. This strategy limits the model's ability to leverage information from distant context. We overcome this limitation with a novel Document Flattening (DocFlat) technique that integrates Flat-Batch Attention (FBA) and Neural Context Gate (NCG) into Transformer model to utilize information beyond the pseudo-document boundaries. FBA allows the model to attend to all the positions in the batch and learns the relationships between positions explicitly and NCG identifies the useful information from the distant context. We conduct comprehensive experiments and analyses on three benchmark datasets for English-German translation, and validate the effectiveness of two variants of DocFlat. Empirical results show that our approach outperforms strong baselines with statistical significance on BLEU, COMET and accuracy on the contrastive test set. The analyses highlight that DocFlat is highly effective in capturing the long-range information.
arXiv.org Artificial Intelligence
Feb-15-2023
- Country:
- Asia
- Middle East > Republic of Türkiye
- Istanbul Province > Istanbul (0.04)
- Thailand > Phuket
- Phuket (0.04)
- Middle East > Republic of Türkiye
- Europe
- Czechia > Prague (0.04)
- Belgium > Brussels-Capital Region
- Brussels (0.04)
- Ireland > Leinster
- County Dublin > Dublin (0.04)
- Middle East > Republic of Türkiye
- Istanbul Province > Istanbul (0.04)
- Spain > Catalonia
- Barcelona Province > Barcelona (0.04)
- Italy
- Trentino-Alto Adige/Südtirol > Trentino Province
- Trento (0.04)
- Tuscany > Florence (0.04)
- Trentino-Alto Adige/Südtirol > Trentino Province
- Portugal > Lisbon
- Lisbon (0.04)
- Denmark > Capital Region
- Copenhagen (0.04)
- France > Hauts-de-France
- Germany > Berlin (0.04)
- North America
- Canada
- British Columbia > Metro Vancouver Regional District
- Vancouver (0.04)
- Quebec > Montreal (0.04)
- British Columbia > Metro Vancouver Regional District
- United States
- California
- Los Angeles County > Long Beach (0.04)
- San Diego County > San Diego (0.04)
- Louisiana > Orleans Parish
- New Orleans (0.04)
- Minnesota > Hennepin County
- Minneapolis (0.14)
- Nevada > Clark County
- Las Vegas (0.04)
- Pennsylvania > Philadelphia County
- Philadelphia (0.04)
- Washington > King County
- Seattle (0.04)
- California
- Canada
- Oceania > Australia
- Asia
- Genre:
- Research Report
- Experimental Study (0.48)
- New Finding (0.66)
- Research Report
- Industry:
- Technology: