AITopics | Machine Translation

Collaborating Authors

Machine Translation

"Machine translation (MT) is the application of computers to the task of translating texts from one natural language to another. One of the very earliest pursuits in computer science, MT has proved to be an elusive goal, but today a number of systems are available which produce output which, if not perfect, is of sufficient quality to be useful in a number of specific domains."
– Definition from the European Association for Machine Translation (EAMT).

You can translate text of your choice by using free translators such as: CAPITA, Google Translate, SDL International, SYSTRAN.

News Overviews Instructional Materials AI-Alerts Classics

How to Measure Gender Bias in Machine Translation: Optimal Translators, Multiple Reference Points

Farkas, Anna, Németh, Renáta

arXiv.org Machine LearningNov-12-2020

In this paper--as a case study--we present a systematic study of gender bias in machine translation with Google Translate. We translated sentences containing names of occupations from Hungarian, a language with gender-neutral pronouns, into English. Our aim was to present a fair measure for bias by comparing the translations to an optimal non-biased translator. When assessing bias, we used the following reference points: (1) the distribution of men and women among occupations in both the source and the target language countries, as well as (2) the results of a Hungarian survey that examined if certain jobs are generally perceived as feminine or masculine. We also studied how expanding sentences with adjectives referring to occupations effect the gender of the translated pronouns. As a result, we found bias against both genders, but biased results against women are much more frequent. Translations are closer to our perception of occupations than to objective occupational statistics. Finally, occupations have a greater effect on translation than adjectives.

artificial intelligence, natural language, occupation, (17 more...)

arXiv.org Machine Learning

2011.06445

Country:

Europe > Hungary > Budapest > Budapest (0.04)
North America > United States > Washington > King County > Seattle (0.04)
North America > United States > New York > New York County > New York City (0.04)
(3 more...)

Genre:

Questionnaire & Opinion Survey (1.00)
Research Report > New Finding (0.93)

Industry:

Law (1.00)
Health & Medicine (0.68)
Government > Regional Government > North America Government > United States Government (0.68)

Technology: Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)

Add feedback

On learning language-invariant representations for universal machine translation

AIHubNov-9-2020, 09:50:52 GMT

Despite the recent improvements in neural machine translation (NMT), training a large NMT model with hundreds of millions of parameters usually requires a collection of parallel corpora at a large scale, on the order of millions or even billions of aligned sentences for supervised training (Arivazhagan et al.). While it might be possible to automatically crawl the web to collect parallel sentences for high-resource language pairs, such as German-English and French-English, it is often infeasible or expensive to manually translate large amounts of sentences for low-resource language pairs, such as Nepali-English, Sinhala-English, etc. To this end, the goal of the so-called multilingual universal machine translation, a.k.a., universal machine translation (UMT), is to learn to translate between any pair of languages using a single system, given pairs of translated documents for some of these languages. The hope is that by learning a shared "semantic space" between multiple source and target languages, the model can leverage language-invariant structure from high-resource translation pairs to transfer to the translation between low-resource language pairs, or even enable zero-shot translation. Indeed, training such a single massively multilingual model has gained impressive empirical results, especially in the case of low-resource language pairs (see Figure 1).

language-invariant representation, representation, translation, (16 more...)

AIHub

Technology: Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)

Add feedback

gordicaleksa/pytorch-original-transformer

#artificialintelligenceNov-7-2020, 11:10:16 GMT

This repo contains PyTorch implementation of the original transformer paper ( Vaswani et al.). It's aimed at making it easy to start playing and learning about transformers. Important note: I'll be adding a jupyter notebook soon as well! Transformers were originally proposed by Vaswani et al. in a seminal paper called Attention Is All You Need. You probably heard of transformers one way or another. GPT-3 and BERT to name a few well known ones .

gordicaleksa pytorch-original-transformer, transformer, translation, (13 more...)

#artificialintelligence

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.92)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.71)

Add feedback

AlphaMWE: Construction of Multilingual Parallel Corpora with MWE Annotations

Han, Lifeng, Jones, Gareth, Smeaton, Alan

arXiv.org Artificial IntelligenceNov-7-2020

In this work, we present the construction of multilingual parallel corpora with annotation of multiword expressions (MWEs). MWEs include verbal MWEs (vMWEs) defined in the PARSEME shared task that have a verb as the head of the studied terms. The annotated vMWEs are also bilingually and multilingually aligned manually. The languages covered include English, Chinese, Polish, and German. Our original English corpus is taken from the PARSEME shared task in 2018. We performed machine translation of this source corpus followed by human post editing and annotation of target MWEs. Strict quality control was applied for error limitation, i.e., each MT output sentence received first manual post editing and annotation plus second manual quality rechecking. One of our findings during corpora preparation is that accurate translation of MWEs presents challenges to MT systems. To facilitate further MT research, we present a categorisation of the error types encountered by MT systems in performing MWE related translation. To acquire a broader view of MT issues, we selected four popular state-of-the-art MT models for comparisons namely: Microsoft Bing Translator, GoogleMT, Baidu Fanyi and DeepL MT. Because of the noise removal, translation post editing and MWE annotation by human professionals, we believe our AlphaMWE dataset will be an asset for cross-lingual and multilingual research, such as MT and information extraction. Our multilingual corpora are available as open access at github.com/poethan/AlphaMWE.

corpus, mwe, translation, (16 more...)

arXiv.org Artificial Intelligence

2011.03783

Country:

Europe > Ireland > Leinster > County Dublin > Dublin (0.14)
Asia > Middle East > Palestine (0.14)
South America > Uruguay > Maldonado > Maldonado (0.04)
(18 more...)

Genre: Research Report (0.70)

Technology: Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)

Add feedback

Template Controllable keywords-to-text Generation

Mishra, Abhijit, Chowdhury, Md Faisal Mahbub, Manohar, Sagar, Gutfreund, Dan, Sankaranarayanan, Karthik

arXiv.org Artificial IntelligenceNov-7-2020

This paper proposes a novel neural model for the understudied task of generating text from keywords. The model takes as input a set of un-ordered keywords, and part-of-speech (POS) based template instructions. This makes it ideal for surface realization in any NLG setup. The framework is based on the encode-attend-decode paradigm, where keywords and templates are encoded first, and the decoder judiciously attends over the contexts derived from the encoded keywords and templates to generate the sentences. Training exploits weak supervision, as the model trains on a large amount of labeled data with keywords and POS based templates prepared through completely automatic means. Qualitative and quantitative performance analyses on publicly available test-data in various domains reveal our system's superiority over baselines, built using state-of-the-art neural machine translation and controllable transfer techniques. Our approach is indifferent to the order of input keywords.

computational linguistic, keyword, template, (15 more...)

arXiv.org Artificial Intelligence

2011.03722

Country:

Europe > Denmark > Capital Region > Copenhagen (0.04)
Asia > China > Hong Kong (0.04)
Europe > Italy > Tuscany > Florence (0.04)
(7 more...)

Genre: Research Report (0.50)

Industry: Media > Film (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Natural Language > Grammars & Parsing (0.70)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Add feedback

Participatory Research for Low-resourced Machine Translation: A Case Study in African Languages

Nekoto, Wilhelmina, Marivate, Vukosi, Matsila, Tshinondiwa, Fasubaa, Timi, Kolawole, Tajudeen, Fagbohungbe, Taiwo, Akinola, Solomon Oluwole, Muhammad, Shamsuddeen Hassan, Kabongo, Salomon, Osei, Salomey, Freshia, Sackey, Niyongabo, Rubungo Andre, Macharm, Ricky, Ogayo, Perez, Ahia, Orevaoghene, Meressa, Musie, Adeyemi, Mofe, Mokgesi-Selinga, Masabata, Okegbemi, Lawrence, Martinus, Laura Jane, Tajudeen, Kolawole, Degila, Kevin, Ogueji, Kelechi, Siminyu, Kathleen, Kreutzer, Julia, Webster, Jason, Ali, Jamiil Toure, Abbott, Jade, Orife, Iroro, Ezeani, Ignatius, Dangana, Idris Abdulkabir, Kamper, Herman, Elsahar, Hady, Duru, Goodness, Kioko, Ghollah, Murhabazi, Espoir, van Biljon, Elan, Whitenack, Daniel, Onyefuluchi, Christopher, Emezue, Chris, Dossou, Bonaventure, Sibanda, Blessing, Bassey, Blessing Itoro, Olabiyi, Ayodele, Ramkilowan, Arshath, Öktem, Alp, Akinfaderin, Adewale, Bashir, Abdallah

arXiv.org Artificial IntelligenceNov-6-2020

Research in NLP lacks geographic diversity, and the question of how NLP can be scaled to low-resourced languages has not yet been adequately solved. "Low-resourced"-ness is a complex problem going beyond data availability and reflects systemic problems in society. In this paper, we focus on the task of Machine Translation (MT), that plays a crucial role for information accessibility and communication worldwide. Despite immense improvements in MT over the past decade, MT is centered around a few high-resourced languages. As MT researchers cannot solve the problem of low-resourcedness alone, we propose participatory research as a means to involve all necessary agents required in the MT development process. We demonstrate the feasibility and scalability of participatory research with a case study on MT for African languages. Its implementation leads to a collection of novel translation datasets, MT benchmarks for over 30 languages, with human evaluations for a third of them, and enables participants without formal training to make a unique scientific contribution. Benchmarks, models, data, code, and evaluation results are released under https://github.com/masakhane-io/masakhane-mt.

computational linguistic, proceedings, translation, (14 more...)

arXiv.org Artificial Intelligence

2010.02353

Country:

Europe > Italy > Tuscany > Florence (0.05)
Asia > China > Hong Kong (0.04)
North America > United States > Texas > Travis County > Austin (0.04)
(29 more...)

Genre: Research Report (0.82)

Industry: Education > Educational Setting > Online (0.46)

Technology: Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)

Add feedback

Evolution and Future of Natural Language Processing (NLP)

#artificialintelligenceNov-5-2020, 15:02:27 GMT

Natural Language Processing (NLP) a subset technique of Artificial Intelligence which is used to narrow the communication gap between the Computer and Human. It is originated from the idea of Machine Translation (MT) which came to existence during the second world war. The primary idea was to convert one human language to another human language, for example, turning the Russian language to English language using the brain of the Computers but after that, the thought of conversion of human language to computer language and vice-versa emerged, so that communication with the machine became easy. In simple words, a language can be understood as a group of rules or symbol. These symbols are integrated and then used for transmitting as well as broadcasting the information.

communication, natural language processing, nlp, (15 more...)

#artificialintelligence

Industry: Health & Medicine (1.00)

Technology: Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.89)

Add feedback

Translating lost languages using machine learning

#artificialintelligenceNov-4-2020, 06:45:12 GMT

Recent research suggests that most languages that have ever existed are no longer spoken. Dozens of these dead languages are also considered to be lost, or "undeciphered" -- that is, we don't know enough about their grammar, vocabulary, or syntax to be able to actually understand their texts. Lost languages are more than a mere academic curiosity; without them, we miss an entire body of knowledge about the people who spoke them. Unfortunately, most of them have such minimal records that scientists can't decipher them by using machine-translation algorithms like Google Translate. Some don't have a well-researched "relative" language to be compared to, and often lack traditional dividers like white space and punctuation.

algorithm, artificial intelligence, natural language, (11 more...)

#artificialintelligence

Country: North America > United States > Massachusetts > Middlesex County > Cambridge (0.40)

Genre: Research Report > New Finding (0.71)

Technology: Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.56)

Add feedback

Detecting Hallucinated Content in Conditional Neural Sequence Generation

Zhou, Chunting, Gu, Jiatao, Diab, Mona, Guzman, Paco, Zettlemoyer, Luke, Ghazvininejad, Marjan

arXiv.org Artificial IntelligenceNov-4-2020

Neural sequence models can generate highly fluent sentences but recent studies have also shown that they are also prone to hallucinate additional content not supported by the input, which can cause a lack of trust in the model. To better assess the faithfulness of the machine outputs, we propose a new task to predict whether each token in the output sequence is hallucinated conditioned on the source input, and collect new manually annotated evaluation sets for this task. We also introduce a novel method for learning to model hallucination detection, based on pretrained language models fine tuned on synthetic data that includes automatically inserted hallucinations. Experiments on machine translation and abstract text summarization demonstrate the effectiveness of our proposed approach -- we obtain an average F1 of around 0.6 across all the benchmark datasets and achieve significant improvements in sentence-level hallucination scoring compared to baseline methods. We also release our annotated data and code for future research at https://github.com/violet-zct/fairseq-detect-hallucination.

artificial intelligence, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2011.02593

Country:

Europe > Belgium > Brussels-Capital Region > Brussels (0.04)
North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
North America > United States > New York (0.04)
(4 more...)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

TransQuest: Translation Quality Estimation with Cross-lingual Transformers

Ranasinghe, Tharindu, Orasan, Constantin, Mitkov, Ruslan

arXiv.org Artificial IntelligenceNov-4-2020

Recent years have seen big advances in the field of sentence-level quality estimation (QE), largely as a result of using neural-based architectures. However, the majority of these methods work only on the language pair they are trained on and need retraining for new language pairs. This process can prove difficult from a technical point of view and is usually computationally expensive. In this paper we propose a simple QE framework based on cross-lingual transformers, and we use it to implement and evaluate two different neural architectures. Our evaluation shows that the proposed methods achieve state-of-the-art results outperforming current open-source quality estimation frameworks when trained on datasets from WMT. In addition, the framework proves very useful in transfer learning settings, especially when dealing with low-resourced languages, allowing us to obtain very competitive results.

artificial intelligence, machine learning, natural language, (15 more...)

arXiv.org Artificial Intelligence

2011.01536

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
Europe > Italy > Tuscany > Florence (0.04)
North America > United States > New Mexico > Santa Fe County > Santa Fe (0.04)
(12 more...)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback