AITopics | Machine Translation

Collaborating Authors

Machine Translation

"Machine translation (MT) is the application of computers to the task of translating texts from one natural language to another. One of the very earliest pursuits in computer science, MT has proved to be an elusive goal, but today a number of systems are available which produce output which, if not perfect, is of sufficient quality to be useful in a number of specific domains."
– Definition from the European Association for Machine Translation (EAMT).

You can translate text of your choice by using free translators such as: CAPITA, Google Translate, SDL International, SYSTRAN.

News Overviews Instructional Materials AI-Alerts Classics

Dong

AAAI ConferencesFeb-8-2022, 11:43:51 GMT

While parallel corpora are an indispensable resource for data-driven multilingual natural language processing tasks such as machine translation, they are limited in quantity, quality and coverage. As a result, learning translation models from non-parallel corpora has become increasingly important nowadays, especially for low-resource languages. In this work, we propose a joint model for iteratively learning parallel lexicons and phrases from nonparallel corpora. The model is trained using a Viterbi EM algorithm that alternates between constructing parallel phrases using lexicons and updating lexicons based on the constructed parallel phrases. Experiments on Chinese-English datasets show that our approach learns better parallel lexicons and phrases and improves translation performance significantly.

corpora, parallel lexicon and phrase, parallel phrase, (1 more...)

AAAI Conferences

Technology: Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)

Add feedback

Huang

AAAI ConferencesFeb-8-2022, 11:43:03 GMT

Computer-aided translation (CAT) system is the most popular tool which helps human translators perform language translation efficiently. To further improve the efficiency, there is an increasing interest in applying the machine translation (MT) technology to upgrade CAT. Post-editing is a standard approach: human translators generate the translation by correcting MT outputs. In this paper, we propose a novel approach deeply integrating MT into CAT systems: a well-designed input method which makes full use of the knowledge adopted by MT systems, such as translation rules, decoding hypotheses and n-best translation lists. Our proposed approach allows human translators to focus on choosing better translation results with less time rather than just complete translation themselves. The extensive experiments demonstrate that our method saves more than 14% time and over 33% keystrokes, and it improves the translation quality as well by more than 3 absolute BLEU scores compared with the strong baseline, i.e., post-editing using Google Pinyin.

huang, human translator, translation

AAAI Conferences

Technology: Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)

Add feedback

Lee

AAAI ConferencesFeb-8-2022, 11:23:16 GMT

We present the first automatic emotion detection system for Cantonese. This system classifies input text into eight emotion classes: expectancy, joy, love, surprise, anxiety, sorrow, angry, or hate. While a number of emotion corpora and lexica for Mandarin Chinese have been developed, no emotion dataset is available for Cantonese. We leverage existing Mandarin Chinese emotion resources to build the system, with support from Cantonese-Mandarin lexical mappings from a machine translation system, as well as English-Mandarin lexical mappings to handle code-switching in Cantonese input. Evaluation on a set of Cantonese sentences from social media shows promising results.

cantonese, lee, lexical mapping

AAAI Conferences

Technology: Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)

Add feedback

Alkhatib

AAAI ConferencesFeb-8-2022, 11:20:10 GMT

The task of transliteration of named entities from one language into another is complicated and considered as one of the challenging tasks in machine translation (MT). To build a well performed transliteration system, we apply well-established techniques based on Hybrid Deep Learning. The system based on convolutional neural network (CNN) followed by Bi-LSTM and CRF. The proposed hybrid mechanism is examined on ANERCorp and Kalimat corpus. The results show that the neural machine translation approach can be employed to build efficient machine transliteration systems achieving state-of-the-art results for Arabic – English language.

alkhatib, transliteration system

AAAI Conferences

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Ahmadnia

AAAI ConferencesFeb-8-2022, 11:18:12 GMT

Neural Machine Translation (NMT) relies heavily on word embeddings, which are continuous representations of words in a vector space, obtained from large monolingual data and, independently, from bilingual data for NMT model training. Word embeddings have proven to be invaluable for performance improvements in natural language analysis tasks that otherwise suffer from data scarcity. This paper defines a new cost function---demonstrated on Farsi-Spanish low-resource attention-based NMT---that encodes word similarity as distances within a word embedding space. The novelty of this cost function is that it encourages our attentional NMT model to generate words that are close to their references in the embedding space. This approach encourages the decoder to select acceptably similar words when potential candidates are found to be Out-Of-Vocabulary (OOV).

ahmadnia, attentional nmt model, cost function

AAAI Conferences

Technology: Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)

Add feedback

The Ecological Footprint of Neural Machine Translation Systems

Shterionov, Dimitar, Vanmassenhove, Eva

arXiv.org Artificial IntelligenceFeb-4-2022

Over the past decade, deep learning (DL) has led to significant advancements in various fields of artificial intelligence, including machine translation (MT). These advancements would not be possible without the ever-growing volumes of data and the hardware that allows large DL models to be trained efficiently. Due to the large amount of computing cores as well as dedicated memory, graphics processing units (GPUs) are a more effective hardware solution for training and inference with DL models than central processing units (CPUs). However, the former is very power demanding. The electrical power consumption has economical as well as ecological implications. This chapter focuses on the ecological footprint of neural MT systems. It starts from the power drain during the training of and the inference with neural MT models and moves towards the environment impact, in terms of carbon dioxide emissions. Different architectures (RNN and Transformer) and different GPUs (consumer-grate NVidia 1080Ti and workstation-grade NVidia P100) are compared. Then, the overall CO2 offload is calculated for Ireland and the Netherlands. The NMT models and their ecological impact are compared to common household appliances to draw a more clear picture. The last part of this chapter analyses quantization, a technique for reducing the size and complexity of models, as a way to reduce power consumption. As quantized models can run on CPUs, they present a power-efficient inference solution without depending on a GPU.

artificial intelligence, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2202.0217

Country:

Europe > Netherlands (0.25)
North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
North America > United States > Utah > Salt Lake County > Salt Lake City (0.04)
(18 more...)

Genre: Research Report > New Finding (0.67)

Industry:

Information Technology (1.00)
Energy > Power Industry (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Formal Mathematics Statement Curriculum Learning

Polu, Stanislas, Han, Jesse Michael, Zheng, Kunhao, Baksys, Mantas, Babuschkin, Igor, Sutskever, Ilya

arXiv.org Artificial IntelligenceFeb-2-2022

We explore the use of expert iteration in the context of language modeling applied to formal mathematics. We show that at same compute budget, expert iteration, by which we mean proof search interleaved with learning, dramatically outperforms proof search only. We also observe that when applied to a collection of formal statements of sufficiently varied difficulty, expert iteration is capable of finding and solving a curriculum of increasingly difficult problems, without the need for associated ground-truth proofs. Finally, by applying this expert iteration to a manually curated set of problem statements, we achieve state-of-the-art on the miniF2F benchmark, automatically solving multiple challenging problems drawn from high school olympiads.

iteration, objective, proof search, (13 more...)

arXiv.org Artificial Intelligence

2202.01344

Country: Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report (0.64)

Industry: Education (0.87)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Search (0.68)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.67)

Add feedback

Should AI Be Centered on Machine Learning Algorithms or Data?

#artificialintelligenceFeb-1-2022, 05:25:15 GMT

Arun Shastri, PhD, leads ZS's global AI strategy practice, which spans research, helping clients build their capabilities and platform solutions. In this role, he also oversees analytics services and solutions for several industry sectors. PKS Prakash, PhD is a principal at ZS Associates; he designs and implements advanced data science and AI techniques across multiple verticals including healthcare, hospitality, retail and manufacturing.

centered, machine learning algorithm

#artificialintelligence

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.40)

Add feedback

Cross-Lingual Dialogue Dataset Creation via Outline-Based Generation

Majewska, Olga, Razumovskaia, Evgeniia, Ponti, Edoardo Maria, Vulić, Ivan, Korhonen, Anna

arXiv.org Artificial IntelligenceJan-31-2022

Multilingual task-oriented dialogue (ToD) facilitates access to services and information for many (communities of) speakers. Nevertheless, the potential of this technology is not fully realised, as current datasets for multilingual ToD - both for modular and end-to-end modelling - suffer from severe limitations. 1) When created from scratch, they are usually small in scale and fail to cover many possible dialogue flows. 2) Translation-based ToD datasets might lack naturalness and cultural specificity in the target language. In this work, to tackle these limitations we propose a novel outline-based annotation process for multilingual ToD datasets, where domain-specific abstract schemata of dialogue are mapped into natural language outlines. These in turn guide the target language annotators in writing a dialogue by providing instructions about each turn's intents and slots. Through this process we annotate a new large-scale dataset for training and evaluation of multilingual and cross-lingual ToD systems. Our Cross-lingual Outline-based Dialogue dataset (termed COD) enables natural language understanding, dialogue state tracking, and end-to-end dialogue modelling and evaluation in 4 diverse languages: Arabic, Indonesian, Russian, and Kiswahili. Qualitative and quantitative analyses of COD versus an equivalent translation-based dataset demonstrate improvements in data quality, unlocked by the outline-based approach. Finally, we benchmark a series of state-of-the-art systems for cross-lingual ToD, setting reference scores for future work and demonstrating that COD prevents over-inflated performance, typically met with prior translation-based ToD datasets.

artificial intelligence, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

doi: 10.1162/tacl_a_00539

2201.13405

Country:

Africa > Niger (0.04)
North America > United States > Pennsylvania (0.04)
North America > United States > New York (0.04)
(6 more...)

Genre: Research Report (0.82)

Industry:

Transportation > Air (0.93)
Transportation > Passenger (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.93)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.93)
Information Technology > Artificial Intelligence > Natural Language > Discourse & Dialogue (0.88)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Use a web browser plugin to quickly translate text with Amazon Translate

#artificialintelligenceJan-28-2022, 16:18:04 GMT

Web browsers can be a single pane of glass for organizations to interact with their information--all of the tools can be viewed and accessed on one screen so that users don't have to switch between applications and interfaces. For example, a customer call center might have several different applications to see customer reviews, social media feeds, and customer data. Each one of these applications are interacted with through web browsers. If the information is in a language that the user doesn't speak, however, a separate application often needs to be pulled up to translate text. Web browser plugins enable customization of this user experience.

amazon translate, browser plugin, plugin, (12 more...)

#artificialintelligence

Industry: Retail > Online (0.40)

Technology:

Information Technology > Communications > Web (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.55)

Add feedback