AITopics | Machine Translation

Collaborating Authors

Machine Translation

"Machine translation (MT) is the application of computers to the task of translating texts from one natural language to another. One of the very earliest pursuits in computer science, MT has proved to be an elusive goal, but today a number of systems are available which produce output which, if not perfect, is of sufficient quality to be useful in a number of specific domains."
– Definition from the European Association for Machine Translation (EAMT).

You can translate text of your choice by using free translators such as: CAPITA, Google Translate, SDL International, SYSTRAN.

News Overviews Instructional Materials AI-Alerts Classics

Self-Guided Curriculum Learning for Neural Machine Translation

Zhou, Lei, Ding, Liang, Duh, Kevin, Sasano, Ryohei, Takeda, Koichi

arXiv.org Artificial IntelligenceMay-10-2021

In the field of machine learning, the well-trained model is assumed to be able to recover the training labels, i.e. the synthetic labels predicted by the model should be as close to the ground-truth labels as possible. Inspired by this, we propose a self-guided curriculum strategy to encourage the learning of neural machine translation (NMT) models to follow the above recovery criterion, where we cast the recovery degree of each training example as its learning difficulty. Specifically, we adopt the sentence level BLEU score as the proxy of recovery degree. Different from existing curricula relying on linguistic prior knowledge or third-party language models, our chosen learning difficulty is more suitable to measure the degree of knowledge mastery of the NMT models. Experiments on translation benchmarks, including WMT14 English$\Rightarrow$German and WMT17 Chinese$\Rightarrow$English, demonstrate that our approach can consistently improve translation performance against strong baseline Transformer.

artificial intelligence, machine translation, recovery degree, (18 more...)

arXiv.org Artificial Intelligence

2105.04475

Genre: Research Report (1.00)

Industry:

Education (0.49)
Health & Medicine (0.46)
Materials > Chemicals > Industrial Gases > Liquified Gas (0.46)
(2 more...)

Technology: Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)

Add feedback

Continual Mixed-Language Pre-Training for Extremely Low-Resource Neural Machine Translation

Liu, Zihan, Winata, Genta Indra, Fung, Pascale

arXiv.org Artificial IntelligenceMay-9-2021

The data scarcity in low-resource languages has become a bottleneck to building robust neural machine translation systems. Fine-tuning a multilingual pre-trained model (e.g., mBART (Liu et al., 2020)) on the translation task is a good approach for low-resource languages; however, its performance will be greatly limited when there are unseen languages in the translation pairs. In this paper, we present a continual pre-training (CPT) framework on mBART to effectively adapt it to unseen languages. We first construct noisy mixed-language text from the monolingual corpus of the target language in the translation pair to cover both the source and target languages, and then, we continue pre-training mBART to reconstruct the original monolingual text. Results show that our method can consistently improve the fine-tuning performance upon the mBART baseline, as well as other strong baselines, across all tested low-resource translation pairs containing unseen languages. Furthermore, our approach also boosts the performance on translation pairs where both languages are seen in the original mBART's pre-training. The code is available at https://github.com/zliucr/cpt-nmt.

translation, translation pair, unseen language, (16 more...)

arXiv.org Artificial Intelligence

2105.03953

Country:

Asia > China > Hong Kong (0.04)
North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
North America > United States > California > San Diego County > San Diego (0.04)
(2 more...)

Genre: Research Report > New Finding (0.34)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

How is Artificial Intelligence Challenging the Translation Industry?

#artificialintelligenceMay-6-2021, 05:35:28 GMT

Language is perhaps the most defining factor of humankind. What makes humans different from other animals on the planet is our ability to speak out and communicate via framed words and sentences. The language of a population is one of the most defining factors across countries and nationalities, regions, and cultures. It can define the history, sociocultural situation, and even geographic diversity. From ancient times, there has been a trend for people to understand the language of one another. History traces back to Greeks and Romans traveling all across the world to discover, decipher and translate languages to find out the cultural, political, and social situations from one era to another.

artificial intelligence challenging, translation, translation industry, (12 more...)

#artificialintelligence

Technology: Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.52)

Add feedback

Restoring and Mining the Records of the Joseon Dynasty via Neural Language Modeling and Machine Translation

Kang, Kyeongpil, Jin, Kyohoon, Yang, Soyoung, Jang, Sujin, Choo, Jaegul, Kim, Youngbin

arXiv.org Artificial IntelligenceMay-6-2021

Understanding voluminous historical records provides clues on the past in various aspects, such as social and political issues and even natural science facts. However, it is generally difficult to fully utilize the historical records, since most of the documents are not written in a modern language and part of the contents are damaged over time. As a result, restoring the damaged or unrecognizable parts as well as translating the records into modern languages are crucial tasks. In response, we present a multi-task learning approach to restore and translate historical documents based on a self-attention mechanism, specifically utilizing two Korean historical records, ones of the most voluminous historical records in the world. Experimental results show that our approach significantly improves the accuracy of the translation task than baselines without multi-task learning. In addition, we present an in-depth exploratory analysis on our translated results via topic modeling, uncovering several significant historical events.

computational linguistic, historical record, translation task, (13 more...)

arXiv.org Artificial Intelligence

2104.05964

Country:

Asia > South Korea > Seoul > Seoul (0.05)
Asia > China (0.04)
Asia > South Korea > Daejeon > Daejeon (0.04)
(7 more...)

Genre: Research Report > New Finding (0.34)

Industry: Government (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.94)

Add feedback

Full-Sentence Models Perform Better in Simultaneous Translation Using the Information Enhanced Decoding Strategy

Yang, Zhengxin

arXiv.org Artificial IntelligenceMay-5-2021

Simultaneous translation, which starts translating each sentence after receiving only a few words in source sentence, has a vital role in many scenarios. Although the previous prefix-to-prefix framework is considered suitable for simultaneous translation and achieves good performance, it still has two inevitable drawbacks: the high computational resource costs caused by the need to train a separate model for each latency $k$ and the insufficient ability to encode information because each target token can only attend to a specific source prefix. We propose a novel framework that adopts a simple but effective decoding strategy which is designed for full-sentence models. Within this framework, training a single full-sentence model can achieve arbitrary given latency and save computational resources. Besides, with the competence of the full-sentence model to encode the whole sentence, our decoding strategy can enhance the information maintained in the decoded states in real time. Experimental results show that our method achieves better translation quality than baselines on 4 directions: Zh$\rightarrow$En, En$\rightarrow$Ro and En$\leftrightarrow$De.

computational linguistic, full-sentence model, translation, (15 more...)

arXiv.org Artificial Intelligence

2105.01893

Country:

Europe > Italy > Tuscany > Florence (0.04)
Europe > Belgium > Brussels-Capital Region > Brussels (0.04)
Asia > India > Karnataka > Bengaluru (0.04)
(8 more...)

Genre: Research Report (0.70)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Why Ambitious Predictions About A.I. Are Always Wrong

SlateMay-4-2021, 09:45:00 GMT

Since the very beginning of the computer revolution, researchers have dreamed of creating computers that would rival the human brain. Our brains are information machines that use inputs to generate outputs, and so are computers. How hard could it be to build computers that work as well as our brains? In 1954 a Georgetown-IBM team predicted that language translation programs would be perfected in three to five years. In 1965 Herbert Simon said that "machines will be capable, within twenty years, of doing any work a man can do."

computer, intelligence, moonshot, (14 more...)

Slate

Country:

North America > United States > Arizona (0.05)
Asia > China (0.05)

Industry:

Leisure & Entertainment (1.00)
Transportation > Ground > Road (0.71)
Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (0.53)
Health & Medicine > Therapeutic Area > Immunology (0.52)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.54)
Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles (0.35)

Add feedback

The internet is excluding Asian-Americans who don't speak English

MIT Technology ReviewMay-4-2021, 09:00:00 GMT

And it starts right at the beginning. Instead of the Hmong word for "hello" or "welcome," she says, is "something else that said, like, 'your honor' or'the queen' or'the king' instead." Seeing something so simple done incorrectly was frustrating and off-putting. "Not only was it just probably churned through Google Translate, it wasn't even peer edited and reviewed to ensure that there was fluency and coherence," she says. Xiong says this kind of carelessness is common online--and it's one reason she and others in the Hmong community can feel excluded from politics.

asian-american, platform, speak english, (6 more...)

MIT Technology Review

Country: North America > United States (0.35)

Industry:

Health & Medicine (0.57)
Government (0.53)
Information Technology (0.35)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.37)

Add feedback

Limited English Skills Can Mean Limited Access to the COVID-19 Vaccine

SlateApr-30-2021, 20:30:05 GMT

This story was published in partnership with Type Investigations with support from the Puffin Foundation. In California, non-English speakers handed COVID-19 vaccination cards without information on what they mean. In Pennsylvania, people who speak Mandarin, Korean, and Japanese unable to make vaccine appointments due to a lack of interpreters at hospital call centers. These are just a few of the examples captured in a new complaint filed on Friday to the U.S. Department of Health and Human Services' Office for Civil Rights, Federal Emergency Management Agency's Office of Equal Rights, and Department of Homeland Security's Office for Civil Rights and Civil Liberties. The complaint, brought by the National Health Law Program, finds widespread problems across the country that inhibit access to COVID-19 resources for people with limited English proficiency (LEP).

allkhenfr, translation, website, (14 more...)

Slate

Country:

North America > United States > Pennsylvania (0.25)
North America > United States > California (0.25)
North America > United States > Virginia (0.06)
(13 more...)

Industry:

Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (1.00)
Health & Medicine > Therapeutic Area > Immunology (1.00)
Government > Regional Government > North America Government > United States Government (1.00)

Technology: Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.32)

Add feedback

Translate All: Automating multiple file type batch translation with AWS CloudFormation

#artificialintelligenceApr-29-2021, 21:20:52 GMT

This is a guest post by Cyrus Wong, an AWS Machine Learning Hero. You can learn more about and connect with AWS Machine Learning Heroes at the community page. On July 29, 2020, AWS announced that Amazon Translate now supports Microsoft Office documents, including .docx, The world is full of bilingual countries and cities like Hong Kong. I find myself always needing to prepare Office documents and presentation slides in both English and Chinese.

aw cloudformation, course material, multiple file type batch translation, (10 more...)

#artificialintelligence

Country: Asia > China > Hong Kong (0.28)

Industry: Retail > Online (0.40)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (0.84)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.31)

Add feedback

Experts, Errors, and Context: A Large-Scale Study of Human Evaluation for Machine Translation

Freitag, Markus, Foster, George, Grangier, David, Ratnakar, Viresh, Tan, Qijun, Macherey, Wolfgang

arXiv.org Artificial IntelligenceApr-29-2021

Human evaluation of modern high-quality machine translation systems is a difficult problem, and there is increasing evidence that inadequate evaluation procedures can lead to erroneous conclusions. While there has been considerable research on human evaluation, the field still lacks a commonly-accepted standard procedure. As a step toward this goal, we propose an evaluation methodology grounded in explicit error analysis, based on the Multidimensional Quality Metrics (MQM) framework. We carry out the largest MQM research study to date, scoring the outputs of top systems from the WMT 2020 shared task in two language pairs using annotations provided by professional translators with access to full document context. We analyze the resulting data extensively, finding among other results a substantially different ranking of evaluated systems from the one established by the WMT crowd workers, exhibiting a clear preference for human over machine output. Surprisingly, we also find that automatic metrics based on pre-trained embeddings can outperform human crowd workers. We make our corpus publicly available for further research.

correlation, evaluation, translation, (14 more...)

arXiv.org Artificial Intelligence

2104.14478

Country:

Asia > Japan > Honshū > Tōhoku (0.04)
Europe > Portugal > Lisbon > Lisbon (0.04)
Europe > Middle East > Republic of Türkiye > Istanbul Province > Istanbul (0.04)
(4 more...)

Genre: Research Report (1.00)

Technology: Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)

Add feedback