AITopics | Machine Translation

Collaborating Authors

Machine Translation

"Machine translation (MT) is the application of computers to the task of translating texts from one natural language to another. One of the very earliest pursuits in computer science, MT has proved to be an elusive goal, but today a number of systems are available which produce output which, if not perfect, is of sufficient quality to be useful in a number of specific domains."
– Definition from the European Association for Machine Translation (EAMT).

You can translate text of your choice by using free translators such as: CAPITA, Google Translate, SDL International, SYSTRAN.

News Overviews Instructional Materials AI-Alerts Classics

IndicBART: A Pre-trained Model for Natural Language Generation of Indic Languages

Dabre, Raj, Shrotriya, Himani, Kunchukuttan, Anoop, Puduppully, Ratish, Khapra, Mitesh M., Kumar, Pratyush

arXiv.org Artificial IntelligenceSep-7-2021

In this paper we present IndicBART, a multilingual, sequence-to-sequence pre-trained model focusing on 11 Indic languages and English. Different from existing pre-trained models, IndicBART utilizes the orthographic similarity between Indic scripts to improve transfer learning between similar Indic languages. We evaluate IndicBART on two NLG tasks: Neural Machine Translation (NMT) and extreme summarization. Our experiments on NMT for 12 language pairs and extreme summarization for 7 languages using multilingual fine-tuning show that IndicBART is competitive with or better than mBART50 despite containing significantly fewer parameters. Our analyses focus on identifying the impact of script unification (to Devanagari), corpora size as well as multilingualism on the final performance. The IndicBART model is available under the MIT license at https://indicnlp.ai4bharat.org/indic-bart .

computational linguistic, indicbart, translation, (13 more...)

arXiv.org Artificial Intelligence

2109.02903

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
Europe > Belgium > Brussels-Capital Region > Brussels (0.04)
Asia > China > Hong Kong (0.04)
(14 more...)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Attention based Sequence to Sequence Learning for Machine Translation of Low Resourced Indic Languages -- A case of Sanskrit to Hindi

Bakarola, Vishvajit, Nasriwala, Jitendra

arXiv.org Artificial IntelligenceSep-7-2021

Deep Learning techniques are powerful in mimicking humans in a particular set of problems. They have achieved a remarkable performance in complex learning tasks. Deep learning inspired Neural Machine Translation (NMT) is a proficient technique that outperforms traditional machine translation. Performing machine-aided translation on Indic languages has always been a challenging task considering their rich and diverse grammar. The neural machine translation has shown quality results compared to the traditional machine translation approaches. The fully automatic machine translation becomes problematic when it comes to low-resourced languages, especially with Sanskrit. This paper presents attention mechanism based neural machine translation by selectively focusing on a particular part of language sentences during translation. The work shows the construction of Sanskrit to Hindi bilingual parallel corpus with nearly 10K samples and having 178,000 tokens. The neural translation model equipped with an attention mechanism has been trained on Sanskrit to Hindi parallel corpus. The approach has shown the significance of attention mechanisms to overcome long-term dependencies, primarily associated with low resources Indic languages. The paper shows the attention plots on testing data to demonstrate the alignment between source and translated words. For the evaluation of the translated sentences, manual score based human evaluation and automatic evaluation metric based techniques have been adopted. The attention mechanism based neural translation has achieved 88% accuracy in human evaluation and a BLEU score of 0.92 on Sanskrit to Hindi translation.

machine translation, sanskrit, translation, (11 more...)

arXiv.org Artificial Intelligence

doi: 10.14445/22315381/IJETT-V69I9P227

2110.00435

Country:

Asia > India > Gujarat (0.04)
North America > United States > Rhode Island > Providence County > Providence (0.04)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
(6 more...)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Knowledge Graph Question Answering via SPARQL Silhouette Generation

Purkayastha, Sukannya, Dana, Saswati, Garg, Dinesh, Khandelwal, Dinesh, Bhargav, G P Shrivatsa

arXiv.org Artificial IntelligenceSep-6-2021

Knowledge Graph Question Answering (KGQA) has become a prominent area in natural language processing due to the emergence of large-scale Knowledge Graphs (KGs). Recently Neural Machine Translation based approaches are gaining momentum that translates natural language queries to structured query languages thereby solving the KGQA task. However, most of these methods struggle with out-of-vocabulary words where test entities and relations are not seen during training time. In this work, we propose a modular two-stage neural architecture to solve the KGQA task. The first stage generates a sketch of the target SPARQL called SPARQL silhouette for the input question. This comprises of (1) Noise simulator to facilitate out-of-vocabulary words and to reduce vocabulary size (2) seq2seq model for text to SPARQL silhouette generation. The second stage is a Neural Graph Search Module. SPARQL silhouette generated in the first stage is distilled in the second stage by substituting precise relation in the predicted structure. We simulate ideal and realistic scenarios by designing a noise simulator. Experimental results show that the quality of generated SPARQL silhouette in the first stage is outstanding for the ideal scenarios but for realistic scenarios (i.e. noisy linker), the quality of the resulting SPARQL silhouette drops drastically. However, our neural graph search module recovers it considerably. We show that our method can achieve reasonable performance improving the state-of-art by a margin of 3.72% F1 for the LC-QuAD-1 dataset. We believe, our proposed approach is novel and will lead to dynamic KGQA solutions that are suited for practical applications.

dataset, entity relation, relation, (14 more...)

arXiv.org Artificial Intelligence

2109.09475

Country:

Europe > United Kingdom (0.04)
Antarctica > French Southern and Antarctic Lands (0.04)
Africa > French Southern and Antarctic Lands (0.04)

Genre: Research Report > New Finding (0.34)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.88)
Information Technology > Artificial Intelligence > Representation & Reasoning > Semantic Networks (0.81)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)

Add feedback

Pushing Paraphrase Away from Original Sentence: A Multi-Round Paraphrase Generation Approach

Lin, Zhe, Wan, Xiaojun

arXiv.org Artificial IntelligenceSep-4-2021

In recent years, neural paraphrase generation based on Seq2Seq has achieved superior performance, however, the generated paraphrase still has the problem of lack of diversity. In this paper, we focus on improving the diversity between the generated paraphrase and the original sentence, i.e., making generated paraphrase different from the original sentence as much as possible. We propose BTmPG (Back-Translation guided multi-round Paraphrase Generation), which leverages multi-round paraphrase generation to improve diversity and employs back-translation to preserve semantic information. We evaluate BTmPG on two benchmark datasets. Both automatic and human evaluation show BTmPG can improve the diversity of paraphrase while preserving the semantics of the original sentence.

computational linguistic, diversity, original sentence, (15 more...)

arXiv.org Artificial Intelligence

doi: 10.18653/v1/2021.findings-acl.135

2109.01862

Country:

Asia > India (0.14)
North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
North America > United States > California > Los Angeles County > Los Angeles (0.14)
(10 more...)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.72)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.68)

Add feedback

An Exploratory Study on Utilising the Web of Linked Data for Product Data Mining

Zhang, Ziqi, Song, Xingyi

arXiv.org Artificial IntelligenceSep-3-2021

The Linked Open Data practice has led to a significant growth of structured data on the Web in the last decade. Such structured data describe real-world entities in a machine-readable way, and have created an unprecedented opportunity for research in the field of Natural Language Processing. However, there is a lack of studies on how such data can be used, for what kind of tasks, and to what extent they can be useful for these tasks. This work focuses on the e-commerce domain to explore methods of utilising such structured data to create language resources that may be used for product classification and linking. We process billions of structured data points in the form of RDF n-quads, to create multi-million words of product-related corpora that are later used in three different ways for creating of language resources: training word embedding models, continued pre-training of BERT-like language models, and training Machine Translation models that are used as a proxy to generate product-related keywords. Our evaluation on an extensive set of benchmarks shows word embeddings to be the most reliable and consistent method to improve the accuracy on both tasks (with up to 6.9 percentage points in macro-average F1 on some datasets). The other two methods however, are not as useful. Our analysis shows that this could be due to a number of reasons, including the biased domain representation in the structured data and lack of vocabulary coverage. We share our datasets and discuss how our lessons learned could be taken forward to inform future research in this direction.

classification, dataset, language resource, (15 more...)

arXiv.org Artificial Intelligence

2109.01411

Country:

North America > United States > New York > New York County > New York City (0.05)
Europe > United Kingdom (0.04)
North America > United States > New Mexico > Santa Fe County > Santa Fe (0.04)
(6 more...)

Genre:

Overview (0.92)
Research Report > New Finding (0.92)

Industry: Information Technology > Services > e-Commerce Services (0.48)

Technology:

Information Technology > Information Management (1.00)
Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
(7 more...)

Add feedback

Masked Adversarial Generation for Neural Machine Translation

Idrissi, Badr Youbi, Clinchant, Stéphane

arXiv.org Artificial IntelligenceSep-1-2021

Attacking Neural Machine Translation models is an inherently combinatorial task on discrete sequences, solved with approximate heuristics. Most methods use the gradient to attack the model on each sample independently. Instead of mechanically applying the gradient, could we learn to produce meaningful adversarial attacks ? In contrast to existing approaches, we learn to attack a model by training an adversarial generator based on a language model. We propose the Masked Adversarial Generation (MAG) model, that learns to perturb the translation model throughout the training process. The experiments show that it improves the robustness of machine translation models, while being faster than competing methods.

computational linguistic, machine translation, translation, (13 more...)

arXiv.org Artificial Intelligence

2109.00417

Country:

Oceania > Australia > Victoria > Melbourne (0.05)
Asia > China > Hong Kong (0.05)
North America > United States > New Mexico > Santa Fe County > Santa Fe (0.04)
(4 more...)

Genre: Research Report (0.64)

Technology: Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)

Add feedback

Trends in Integration of Vision and Language Research: A Survey of Tasks, Datasets, and Methods

Mogadala, Aditya (Saarland University) | Kalimuthu, Marimuthu (Saarland University) | Klakow, Dietrich (Saarland University)

Journal of Artificial Intelligence ResearchAug-30-2021

Interest in Artificial Intelligence (AI) and its applications has seen unprecedented growth in the last few years. This success can be partly attributed to the advancements made in the sub-fields of AI such as machine learning, computer vision, and natural language processing. Much of the growth in these fields has been made possible with deep learning, a sub-area of machine learning that uses artificial neural networks. This has created significant interest in the integration of vision and language. In this survey, we focus on ten prominent tasks that integrate language and vision by discussing their problem formulation, methods, existing datasets, evaluation measures, and compare the results obtained with corresponding state-of-the-art methods. Our efforts go beyond earlier surveys which are either task-specific or concentrate only on one type of visual content, i.e., image or video. Furthermore, we also provide some potential future directions in this field of research with an anticipation that this survey stimulates innovative thoughts and ideas to address the existing challenges and build new applications.

natural language instruction, vision and language integration task, visual description generation, (16 more...)

Journal of Artificial Intelligence Research

doi: 10.1613/jair.1.11688

AI Access Foundation

11688

Journal of Artificial Intelligence Research

Country:

North America > United States > New York > New York County > New York City (0.14)
North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.14)
North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
(46 more...)

Genre:

Research Report > Promising Solution (1.00)
Overview (1.00)

Industry:

Media > Film (1.00)
Leisure & Entertainment > Sports (1.00)
Information Technology (0.92)
Education > Curriculum > Subject-Specific Education (0.34)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

AMAZON MACHINE LEARNING

#artificialintelligenceAug-29-2021, 07:25:32 GMT

What is Amazon Web Services? Amazon Web Services or AWS is world's broadly adopted cloud platform . AWS provides with a number of useful cloud computing services that are very much reliable, scalable and cost efficient as they say. AWS provides services like storage, networking, remote computing, servers, email, mobile development and security . So now coming to Amazon machine learning, frankly means leveraging ML algorithms on cloud platforms like AWS .

ai service, application, translation, (11 more...)

#artificialintelligence

Industry: Information Technology > Services (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.56)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.38)

Add feedback

The Vauquois triangle : Mystery solved

#artificialintelligenceAug-28-2021, 05:55:01 GMT

The Vauquois triangle is a classical hierarchical model for visualizing various machine translation approaches. Before we dive into the Vauquois triangle, let's look at what Machine Translation is. Machine translation is the process of using computer software to translate a text or speech in one natural language to another. The definition may look simple, but the process is extremely difficult. Languages differ in so many ways, grammatically, syntactically (sentence structure), semantically (meanings), etc.

language sentence, machine translation, vauquois triangle, (12 more...)

#artificialintelligence

Country: Europe > Denmark > Capital Region > Copenhagen (0.06)

Technology: Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)

Add feedback

DeepLobe - Machine Learning API as a Service Platform

#artificialintelligenceAug-26-2021, 02:30:49 GMT

Day by day the number of machine learning models is increasing at a pace. With this increasing rate, it is hard for beginners to choose an effective model to perform Natural Language Understanding (NLU) and Natural Language Generation (NLG) mechanisms. Researchers across the globe are working around the clock to achieve more progress in artificial intelligence to build agile and intuitive sequence-to-sequence learning models. And in recent times transformers are one such model which gained more prominence in the field of machine learning to perform speech-to-text activities. The wide availability of other sequence-to-sequence learning models like RNNs, LSTMs, and GRU always raises a challenge for beginners when they think about transformers.

architecture, sequence, transformer, (14 more...)

#artificialintelligence

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.84)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.62)

Add feedback