AITopics | Machine Translation

Collaborating Authors

Machine Translation

"Machine translation (MT) is the application of computers to the task of translating texts from one natural language to another. One of the very earliest pursuits in computer science, MT has proved to be an elusive goal, but today a number of systems are available which produce output which, if not perfect, is of sufficient quality to be useful in a number of specific domains."
– Definition from the European Association for Machine Translation (EAMT).

You can translate text of your choice by using free translators such as: CAPITA, Google Translate, SDL International, SYSTRAN.

News Overviews Instructional Materials AI-Alerts Classics

Example Of Machine Translation In Python And Tensorflow

#artificialintelligenceJun-8-2021, 19:55:10 GMT

We will build a deep neural network that functions as part of an end-to-end machine translation pipeline. The completed pipeline will accept English text as input and return the French translation. For our model, we will use an English and French sample of sentences. The data is located in data/small_vocab_en and data/small_vocab_fr. The small_vocab_en file contains English sentences with their French translations in the small_vocab_fr file.

neural network, sequence, translation, (13 more...)

#artificialintelligence

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.61)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.36)

Add feedback

Encouraging Neural Machine Translation to Satisfy Terminology Constraints

Ailem, Melissa, Liu, Jinghsu, Qader, Raheel

arXiv.org Artificial IntelligenceJun-7-2021

We present a new approach to encourage neural machine translation to satisfy lexical constraints. Our method acts at the training step and thereby avoiding the introduction of any extra computational overhead at inference step. The proposed method combines three main ingredients. The first one consists in augmenting the training data to specify the constraints. Intuitively, this encourages the model to learn a copy behavior when it encounters constraint terms. Compared to previous work, we use a simplified augmentation strategy without source factors. The second ingredient is constraint token masking, which makes it even easier for the model to learn the copy behavior and generalize better. The third one, is a modification of the standard cross entropy loss to bias the model towards assigning high probabilities to constraint words. Empirical results show that our method improves upon related baselines in terms of both BLEU score and the percentage of generated constraint terms.

constraint, machine translation, translation, (13 more...)

arXiv.org Artificial Intelligence

2106.0373

Country: Europe > France (0.04)

Genre: Research Report (0.70)

Technology: Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)

Add feedback

The FLORES-101 Evaluation Benchmark for Low-Resource and Multilingual Machine Translation

Goyal, Naman, Gao, Cynthia, Chaudhary, Vishrav, Chen, Peng-Jen, Wenzek, Guillaume, Ju, Da, Krishnan, Sanjana, Ranzato, Marc'Aurelio, Guzman, Francisco, Fan, Angela

arXiv.org Artificial IntelligenceJun-6-2021

One of the biggest challenges hindering progress in low-resource and multilingual machine translation is the lack of good evaluation benchmarks. Current evaluation benchmarks either lack good coverage of low-resource languages, consider only restricted domains, or are low quality because they are constructed using semi-automatic procedures. In this work, we introduce the FLORES-101 evaluation benchmark, consisting of 3001 sentences extracted from English Wikipedia and covering a variety of different topics and domains. These sentences have been translated in 101 languages by professional translators through a carefully controlled process. The resulting dataset enables better assessment of model quality on the long tail of low-resource languages, including the evaluation of many-to-many multilingual translation systems, as all translations are multilingually aligned. By publicly releasing such a high-quality and high-coverage dataset, we hope to foster progress in the machine translation community and beyond.

evaluation, machine translation, translation, (12 more...)

arXiv.org Artificial Intelligence

2106.03193

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.13)
Europe > Italy > Tuscany > Florence (0.04)
North America > Central America (0.04)
(7 more...)

Genre: Research Report > New Finding (0.67)

Industry: Health & Medicine (0.46)

Technology: Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)

Add feedback

E2E-VLP: End-to-End Vision-Language Pre-training Enhanced by Visual Learning

Xu, Haiyang, Yan, Ming, Li, Chenliang, Bi, Bin, Huang, Songfang, Xiao, Wenming, Huang, Fei

arXiv.org Artificial IntelligenceJun-4-2021

Vision-language pre-training (VLP) on large-scale image-text pairs has achieved huge success for the cross-modal downstream tasks. The most existing pre-training methods mainly adopt a two-step training procedure, which firstly employs a pre-trained object detector to extract region-based visual features, then concatenates the image representation and text embedding as the input of Transformer to train. However, these methods face problems of using task-specific visual representation of the specific object detector for generic cross-modal understanding, and the computation inefficiency of two-stage pipeline. In this paper, we propose the first end-to-end vision-language pre-trained model for both V+L understanding and generation, namely E2E-VLP, where we build a unified Transformer framework to jointly learn visual representation, and semantic alignments between image and text. We incorporate the tasks of object detection and image captioning into pre-training with a unified Transformer encoder-decoder architecture for enhancing visual learning. An extensive set of experiments have been conducted on well-established vision-language downstream tasks to demonstrate the effectiveness of this novel VLP paradigm.

architecture, e2e-vlp, representation, (15 more...)

arXiv.org Artificial Intelligence

2106.01804

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.68)

Add feedback

Part of Speech and Universal Dependency effects on English Arabic Machine Translation

Rafaeli, Ofek, Abend, Omri, Choshen, Leshem, Nikolaev, Dmitry

arXiv.org Artificial IntelligenceJun-3-2021

In this research paper, I will elaborate on a method to evaluate machine translation models based on their performance on underlying syntactical phenomena between English and Arabic languages. This method is especially important as such "neural" and "machine learning" are hard to fine-tune and change. Thus, finding a way to evaluate them easily and diversely would greatly help the task of bettering them.

artificial intelligence, natural language, translation, (14 more...)

arXiv.org Artificial Intelligence

2106.00745

Country:

Europe > United Kingdom > England (0.05)
Asia > Mongolia (0.04)
Asia > India (0.04)
Africa > South Africa > Western Cape > Cape Town (0.04)

Genre: Research Report (1.00)

Industry: Education > Educational Setting (0.46)

Technology: Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)

Add feedback

Language Scaling for Universal Suggested Replies Model

Ying, Qianlan, Bajaj, Payal, Deb, Budhaditya, Yang, Yu, Wang, Wei, Lin, Bojia, Shokouhi, Milad, Song, Xia, Yang, Yang, Jiang, Daxin

arXiv.org Artificial IntelligenceJun-3-2021

We consider the problem of scaling automated suggested replies for Outlook email system to multiple languages. Faced with increased compute requirements and low resources for language expansion, we build a single universal model for improving the quality and reducing run-time costs of our production system. However, restricted data movement across regional centers prevents joint training across languages. To this end, we propose a multi-task continual learning framework, with auxiliary tasks and language adapters to learn universal language representation across regions. The experimental results show positive cross-lingual transfer across languages while reducing catastrophic forgetting across regions. Our online results on real user traffic show significant gains in CTR and characters saved, as well as 65% training cost reduction compared with per-language models. As a consequence, we have scaled the feature in multiple languages including low-resource markets.

arxiv preprint arxiv, uniplm-all-cl 0, uniplm-hrl 0, (14 more...)

arXiv.org Artificial Intelligence

2106.02232

Country:

North America > United States > Washington > King County > Seattle (0.04)
North America > United States > Washington > King County > Bellevue (0.04)
North America > United States > New York (0.04)
(2 more...)

Genre: Research Report > New Finding (0.34)

Industry: Information Technology > Security & Privacy (0.68)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.46)

Add feedback

ZmBART: An Unsupervised Cross-lingual Transfer Framework for Language Generation

Maurya, Kaushal Kumar, Desarkar, Maunendra Sankar, Kano, Yoshinobu, Deepshikha, Kumari

arXiv.org Artificial IntelligenceJun-3-2021

Despite the recent advancement in NLP research, cross-lingual transfer for natural language generation is relatively understudied. In this work, we transfer supervision from high resource language (HRL) to multiple low-resource languages (LRLs) for natural language generation (NLG). We consider four NLG tasks (text summarization, question generation, news headline generation, and distractor generation) and three syntactically diverse languages, i.e., English, Hindi, and Japanese. We propose an unsupervised cross-lingual language generation framework (called ZmBART) that does not use any parallel or pseudo-parallel/back-translated data. In this framework, we further pre-train mBART sequence-to-sequence denoising auto-encoder model with an auxiliary task using monolingual data of three languages. The objective function of the auxiliary task is close to the target tasks which enriches the multi-lingual latent representation of mBART and provides good initialization for target tasks. Then, this model is fine-tuned with task-specific supervised English data and directly evaluated with low-resource languages in the Zero-shot setting. To overcome catastrophic forgetting and spurious correlation issues, we applied freezing model component and data argumentation approaches respectively. This simple modeling approach gave us promising results.We experimented with few-shot training (with 1000 supervised data points) which boosted the model performance further. We performed several ablations and cross-lingual transferability analyses to demonstrate the robustness of ZmBART.

auxiliary task, computational linguistic, zmbart, (15 more...)

arXiv.org Artificial Intelligence

2106.01597

Country:

Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.14)
Europe > Italy > Tuscany > Florence (0.04)
Asia > Japan > Honshū > Chūbu > Shizuoka Prefecture > Shizuoka (0.04)
(8 more...)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Natural Language > Generation (1.00)

Add feedback

10 Best African Language Datasets for Data Science Projects

#artificialintelligenceJun-2-2021, 18:01:10 GMT

Africa has over 2000 languages however, these languages are not well represented in the existing Natural language processing (NLP) ecosystem. One of the challenges is the lack of useful African language datasets that can be used to solve different social and economical problems. In this article, I have compiled a list of African language datasets from across the web. These datasets can be used in numerous NLP tasks such as text classification, named entity recognition, machine translation, sentiment analysis, speech recognition, and topic modeling. This collection of datasets have been made public to give you an opportunity to use your skills and help solving different challenges.

african language dataset, dataset, language dataset, (11 more...)

#artificialintelligence

Country:

Africa > South Africa (0.06)
Africa > Senegal (0.06)
Africa > Rwanda (0.05)
(15 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.96)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.94)
Information Technology > Artificial Intelligence > Speech > Speech Recognition (0.71)

Add feedback

Neural Machine Translation with Monolingual Translation Memory

Cai, Deng, Wang, Yan, Li, Huayang, Lam, Wai, Liu, Lemao

arXiv.org Artificial IntelligenceJun-2-2021

Prior work has proved that Translation memory (TM) can boost the performance of Neural Machine Translation (NMT). In contrast to existing work that uses bilingual corpus as TM and employs source-side similarity search for memory retrieval, we propose a new framework that uses monolingual memory and performs learnable memory retrieval in a cross-lingual manner. Our framework has unique advantages. First, the cross-lingual memory retriever allows abundant monolingual data to be TM. Second, the memory retriever and NMT model can be jointly optimized for the ultimate translation goal. Experiments show that the proposed method obtains substantial improvements. Remarkably, it even outperforms strong TM-augmented NMT baselines using bilingual TM. Owning to the ability to leverage monolingual data, our model also demonstrates effectiveness in low-resource and domain adaptation scenarios.

machine translation, proceedings, translation, (12 more...)

arXiv.org Artificial Intelligence

2105.11269

Country:

Europe (0.14)
North America > Canada > Quebec > Montreal (0.04)
Asia > China > Hong Kong (0.04)

Genre:

Research Report > New Finding (0.68)
Research Report > Experimental Study (0.46)

Technology: Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)

Add feedback

Global-Selector: A New Benchmark Dataset and Model Architecture for Multi-turn Response Selection

Song, Chiyu, He, Hongliang, Qiu, Huachuan, Yu, Haofei, Lan, Zhenzhong

arXiv.org Artificial IntelligenceJun-2-2021

As an essential component of dialogue systems, multi-turn response selection aims to pick out the optimal response among a set of candidates to improve the dialogue fluency. In this paper, we investigate three problems of current response selection approaches, especially for generation-based conversational agents: (i) Existing approaches are often formulated as a sentence scoring problem, which does not consider relationships between responses. (ii) Existing models tend to select undesirable candidates that have large overlaps with the dialogue history. (iii) Negative instances in training are mainly constructed by random sampling from the corpus, whereas generated candidates in practice typically have a closer distribution. To address the above problems, we create a new dataset called ConvAI2+ and propose a new response selector called Global-Selector. Experimental results show that Global-Selector trained on ConvAI2+ have noticeable improvements in both accuracy and inference speed.

arxiv e-print, computational linguistic, global-selector, (13 more...)

arXiv.org Artificial Intelligence

2106.01263

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
Oceania > Australia > Victoria > Melbourne (0.04)
Asia > China > Hong Kong (0.04)
(5 more...)

Genre: Research Report > New Finding (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.68)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.49)
(2 more...)

Add feedback