On the Integration of LinguisticFeatures into Statistical and Neural Machine Translation
–arXiv.org Artificial Intelligence
New machine translations (MT) technologies are emerging rapidly and with them, bold claims of achieving human parity such as: (i) the results produced approach "accuracy achieved by average bilingual human translators" (Wu et al., 2017b) or (ii) the "translation quality is at human parity when compared to professional human translators" (Hassan et al., 2018) have seen the light of day (Laubli et al., 2018). Aside from the fact that many of these papers craft their own definition of human parity, these sensational claims are often not supported by a complete analysis of all aspects involved in translation. Establishing the discrepancies between the strengths of statistical approaches to MT and the way humans translate has been the starting point of our research. By looking at MT output and linguistic theory, we were able to identify some remaining issues. The problems range from simple number and gender agreement errors to more complex phenomena such as the correct translation of aspectual values and tenses. Our experiments confirm, along with other studies (Bentivogli et al., 2016), that neural MT has surpassed statistical MT in many aspects. However, some problems remain and others have emerged. We cover a series of problems related to the integration of specific linguistic features into statistical and neural MT, aiming to analyse and provide a solution to some of them. Our work focuses on addressing three main research questions that revolve around the complex relationship between linguistics and MT in general. We identify linguistic information that is lacking in order for automatic translation systems to produce more accurate translations and integrate additional features into the existing pipelines. We identify overgeneralization or 'algorithmic bias' as a potential drawback of neural MT and link it to many of the remaining linguistic issues.
arXiv.org Artificial Intelligence
Mar-31-2020
- Country:
- South America > Chile
- Oceania > Australia
- Victoria > Melbourne (0.04)
- New South Wales > Sydney (0.04)
- North America
- United States
- Maryland > Baltimore (0.13)
- Utah (0.04)
- Hawaii (0.04)
- New Jersey (0.04)
- District of Columbia > Washington (0.04)
- Texas > Travis County
- Austin (0.04)
- Michigan > Washtenaw County
- Ann Arbor (0.04)
- Minnesota > Hennepin County
- Minneapolis (0.13)
- Colorado > Denver County
- Denver (0.04)
- Ohio > Franklin County
- Columbus (0.04)
- New York > Tompkins County
- Ithaca (0.04)
- Louisiana > Orleans Parish
- New Orleans (0.04)
- Oregon > Multnomah County
- Portland (0.04)
- Connecticut > Fairfield County
- Westport (0.04)
- Massachusetts > Middlesex County
- Cambridge (0.04)
- Washington > King County
- Seattle (0.04)
- Kansas > Douglas County
- Lawrence (0.04)
- Florida > Palm Beach County
- Boca Raton (0.04)
- Tennessee > Shelby County
- Memphis (0.04)
- California
- San Francisco County > San Francisco (0.14)
- San Diego County > San Diego (0.04)
- Ventura County > Thousand Oaks (0.04)
- Alameda County > Berkeley (0.04)
- Santa Clara County
- Los Angeles County
- Los Angeles (0.13)
- Long Beach (0.04)
- Canada
- United States
- Europe
- Czechia > Prague (0.04)
- Slovenia (0.04)
- Romania (0.04)
- Latvia > Riga Municipality
- Riga (0.04)
- Iceland > Capital Region
- Reykjavik (0.04)
- Hungary > Budapest
- Budapest (0.04)
- Germany
- Berlin (0.04)
- North Rhine-Westphalia > Cologne Region
- Bonn (0.04)
- Spain
- Canary Islands (0.04)
- Valencian Community
- Valencia Province > Valencia (0.04)
- Alicante Province > Alicante (0.04)
- Galicia > A Coruña Province
- Santiago de Compostela (0.04)
- Catalonia > Barcelona Province
- Barcelona (0.04)
- Netherlands
- North Holland > Amsterdam (0.04)
- Gelderland > Nijmegen (0.04)
- Denmark > Capital Region
- Copenhagen (0.04)
- Sweden > Uppsala County
- Uppsala (0.04)
- Bulgaria > Sofia City Province
- Sofia (0.04)
- Greece > Attica
- Athens (0.04)
- France
- Île-de-France > Paris
- Paris (0.04)
- Occitanie > Haute-Garonne
- Toulouse (0.04)
- Île-de-France > Paris
- Italy
- Tuscany > Florence (0.04)
- Trentino-Alto Adige/Südtirol > Trentino Province
- Trento (0.04)
- Piedmont > Turin Province
- Turin (0.04)
- Belgium
- Brussels-Capital Region > Brussels (0.04)
- Flanders > Flemish Brabant
- Leuven (0.04)
- Middle East > Republic of Türkiye
- Istanbul Province > Istanbul (0.04)
- Ireland > Leinster
- County Dublin > Dublin (0.04)
- United Kingdom
- Scotland > City of Edinburgh
- Edinburgh (0.04)
- England
- Cambridgeshire > Cambridge (0.14)
- Oxfordshire > Oxford (0.14)
- Greater London > London (0.04)
- Essex (0.04)
- Scotland > City of Edinburgh
- Switzerland > Geneva
- Geneva (0.04)
- Portugal
- Asia
- South Korea (0.04)
- Thailand > Phuket
- Phuket (0.04)
- Taiwan > Taiwan Province
- Taipei (0.04)
- Middle East
- Republic of Türkiye > Istanbul Province
- Istanbul (0.04)
- Qatar > Ad-Dawhah
- Doha (0.04)
- Republic of Türkiye > Istanbul Province
- Japan > Kyūshū & Okinawa
- Kyūshū > Miyazaki Prefecture > Miyazaki (0.04)
- India
- Maharashtra > Mumbai (0.04)
- Karnataka > Bengaluru (0.04)
- China
- Hong Kong (0.04)
- Beijing > Beijing (0.04)
- Fujian Province > Xiamen (0.04)
- Africa > Middle East
- Egypt > Giza Governorate > Giza (0.04)
- Genre:
- Research Report
- New Finding (1.00)
- Experimental Study (1.00)
- Research Report
- Industry:
- Education (0.92)
- Government (0.92)
- Law (0.67)
- Technology: