AITopics | Machine Translation

Collaborating Authors

Machine Translation

"Machine translation (MT) is the application of computers to the task of translating texts from one natural language to another. One of the very earliest pursuits in computer science, MT has proved to be an elusive goal, but today a number of systems are available which produce output which, if not perfect, is of sufficient quality to be useful in a number of specific domains."
– Definition from the European Association for Machine Translation (EAMT).

You can translate text of your choice by using free translators such as: CAPITA, Google Translate, SDL International, SYSTRAN.

News Overviews Instructional Materials AI-Alerts Classics

Make Google Translate From Scratch :JAVASCRIPT, PHP ,AJAX

#artificialintelligenceJan-6-2023, 12:31:15 GMT

Make Google Translate From Scratch :JAVASCRIPT, PHP ,AJAX Cour to work with php and javascript in the same time using ajax

artificial intelligence, natural language, programming language, (7 more...)

#artificialintelligence

Technology:

Information Technology > Software > Programming Languages (1.00)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.73)

Add feedback

Building Transformer Models with Attention Crash Course. Build a Neural Machine Translator in 12 Days - MachineLearningMastery.com Building Transformer Models with Attention Crash Course. Build a Neural Machine Translator in 12 Days - MachineLearningMastery.com

#artificialintelligenceJan-6-2023, 00:30:05 GMT

Moreover, when you look at the diagram of the transformer model and your implementation here, you should notice the diagram shows a softmax layer at the output, but we omitted that. The softmax is indeed added in this lesson. Do you see where is it? In the next lesson, you will train this compiled model, on 14 million parameters as we can see in the summary above. Training the transformer depends on everything you created in all previous lessons. Most importantly, the vectorizer and dataset from Lesson 03 must be saved as they will be reused in this and the next lessons. Running this script will take several hours, but once it is finished, you will have the model saved and the loss and accuracy plotted.

artificial intelligence, machine learning, natural language, (18 more...)

#artificialintelligence

Genre: Instructional Material > Course Syllabus & Notes (0.83)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.83)

Add feedback

About Machine Translation. A brief about Machine Translation:

#artificialintelligenceJan-6-2023, 00:20:13 GMT

Machine translation is the process of using computer software to automatically translate text or speech from one language to another. It is a rapidly evolving field, with a wide range of applications, including language education, international communication, and the facilitation of cross-cultural understanding. There are two main types of machine translation: rule-based and statistical. Rule-based machine translation relies on a set of predetermined rules for translating text from one language to another. These rules are created by linguists and language experts, and the translations produced by this type of machine translation are generally more accurate and faithful to the source language.

artificial intelligence, natural language, translation, (11 more...)

#artificialintelligence

Genre: Instructional Material > Course Syllabus & Notes (0.31)

Industry: Education (1.00)

Technology: Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)

Add feedback

Building a Parallel Corpus and Training Translation Models Between Luganda and English

Kimera, Richard, Rim, Daniela N., Choi, Heeyoul

arXiv.org Artificial IntelligenceJan-6-2023

Neural machine translation (NMT) has achieved great successes with large datasets, so NMT is more premised on high-resource languages. This continuously underpins the low resource languages such as Luganda due to the lack of high-quality parallel corpora, so even 'Google translate' does not serve Luganda at the time of this writing. In this paper, we build a parallel corpus with 41,070 pairwise sentences for Luganda and English which is based on three different open-sourced corpora. Then, we train NMT models with hyper-parameter search on the dataset. Experiments gave us a BLEU score of 21.28 from Luganda to English and 17.47 from English to Luganda. Some translation examples show high quality of the translation. We believe that our model is the first Luganda-English NMT model. The bilingual dataset we built will be available to the public.

artificial intelligence, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

doi: 10.5626/JOK.2022.49.11.1009

2301.02773

Country:

Africa > Uganda (0.07)
North America > United States > New York (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
(5 more...)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Boosting Neural Networks to Decompile Optimized Binaries

Cao, Ying, Liang, Ruigang, Chen, Kai, Hu, Peiwei

arXiv.org Artificial IntelligenceJan-3-2023

Decompilation aims to transform a low-level program language (LPL) (eg., binary file) into its functionally-equivalent high-level program language (HPL) (e.g., C/C++). It is a core technology in software security, especially in vulnerability discovery and malware analysis. In recent years, with the successful application of neural machine translation (NMT) models in natural language processing (NLP), researchers have tried to build neural decompilers by borrowing the idea of NMT. They formulate the decompilation process as a translation problem between LPL and HPL, aiming to reduce the human cost required to develop decompilation tools and improve their generalizability. However, state-of-the-art learning-based decompilers do not cope well with compiler-optimized binaries. Since real-world binaries are mostly compiler-optimized, decompilers that do not consider optimized binaries have limited practical significance. In this paper, we propose a novel learning-based approach named NeurDP, that targets compiler-optimized binaries. NeurDP uses a graph neural network (GNN) model to convert LPL to an intermediate representation (IR), which bridges the gap between source code and optimized binary. We also design an Optimized Translation Unit (OTU) to split functions into smaller code fragments for better translation performance. Evaluation results on datasets containing various types of statements show that NeurDP can decompile optimized binaries with 45.21% higher accuracy than state-of-the-art neural decompilation frameworks.

artificial intelligence, machine learning, natural language, (21 more...)

arXiv.org Artificial Intelligence

doi: 10.1145/3564625.3567998

2301.00969

Country:

North America > United States > Texas > Travis County > Austin (0.15)
North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
North America > United States > California > Santa Clara County > Palo Alto (0.14)
(23 more...)

Genre: Research Report > New Finding (0.46)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Statistical Machine Translation for Indic Languages

Das, Sudhansu Bala, Panda, Divyajoti, Mishra, Tapas Kumar, Patra, Bidyut Kr.

arXiv.org Artificial IntelligenceJan-2-2023

Machine Translation (MT) system generally aims at automatic representation of source language into target language retaining the originality of context using various Natural Language Processing (NLP) techniques. Among various NLP methods, Statistical Machine Translation(SMT). SMT uses probabilistic and statistical techniques to analyze information and conversion. This paper canvasses about the development of bilingual SMT models for translating English to fifteen low-resource Indian Languages (ILs) and vice versa. At the outset, all 15 languages are briefed with a short description related to our experimental need. Further, a detailed analysis of Samanantar and OPUS dataset for model building, along with standard benchmark dataset (Flores-200) for fine-tuning and testing, is done as a part of our experiment. Different preprocessing approaches are proposed in this paper to handle the noise of the dataset. To create the system, MOSES open-source SMT toolkit is explored. Distance reordering is utilized with the aim to understand the rules of grammar and context-dependent adjustments through a phrase reordering categorization framework. In our experiment, the quality of the translation is evaluated using standard metrics such as BLEU, METEOR, and RIBES

machine learning, natural language, translation, (18 more...)

arXiv.org Artificial Intelligence

2301.00539

Country:

North America > United States (0.14)
Asia > Pakistan (0.05)
Africa > Middle East > Egypt > Giza Governorate > Giza (0.04)
(20 more...)

Genre: Research Report (1.00)

Industry: Information Technology (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

Sequence to sequence pretraining for a less-resourced Slovenian language

Ulčar, Matej, Robnik-Šikonja, Marko

arXiv.org Artificial IntelligenceJan-2-2023

Large pretrained language models have recently conquered the area of natural language processing. As an alternative to predominant masked language modelling introduced in BERT, the T5 model has introduced a more general training objective, namely sequence to sequence transformation, which includes masked language model but more naturally fits text generation tasks such as machine translation, summarization, question answering, text simplification, dialogue systems, etc. The monolingual variants of T5 models have been limited to well-resourced languages, while the massively multilingual T5 model supports 101 languages. In contrast, we trained two different sized T5-type sequence to sequence models for morphologically rich Slovene language with much less resources and analyzed their behavior on 11 tasks. Concerning classification tasks, the SloT5 models mostly lag behind the monolingual Slovene SloBERTa model but are useful for the generative tasks.

large language model, machine learning, natural language, (21 more...)

arXiv.org Artificial Intelligence

2207.13988

Country:

Europe > Slovenia > Central Slovenia > Municipality of Ljubljana > Ljubljana (0.05)
Europe > Germany (0.04)
Europe > Slovenia > Upper Carniola > Municipality of Kranj > Kranj (0.04)
(6 more...)

Genre: Research Report (0.82)

Industry: Government (0.67)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.69)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.67)
(2 more...)

Add feedback

Text Style Transfer: A Review and Experimental Evaluation

Hu, Zhiqiang, Lee, Roy Ka-Wei, Aggarwal, Charu C., Zhang, Aston

arXiv.org Artificial IntelligenceJan-1-2023

The stylistic properties of text have intrigued computational linguistics researchers in recent years. Specifically, researchers have investigated the Text Style Transfer (TST) task, which aims to change the stylistic properties of the text while retaining its style independent content. Over the last few years, many novel TST algorithms have been developed, while the industry has leveraged these algorithms to enable exciting TST applications. The field of TST research has burgeoned because of this symbiosis. This article aims to provide a comprehensive review of recent research efforts on text style transfer. More concretely, we create a taxonomy to organize the TST models and provide a comprehensive summary of the state of the art. We review the existing evaluation methodologies for TST tasks and conduct a large-scale reproducibility study where we experimentally benchmark 19 state-of-the-art TST algorithms on two publicly available datasets. Finally, we expand on current trends and provide new perspectives on the new and exciting developments in the TST field.

machine learning, natural language, reinforcement learning, (22 more...)

arXiv.org Artificial Intelligence

2010.12742

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
North America > United States > California > San Francisco County > San Francisco (0.14)
Oceania > Australia (0.04)
(16 more...)

Genre:

Overview (1.00)
Research Report > Experimental Study (0.67)
Research Report > New Finding (0.45)

Industry:

Education (1.00)
Information Technology (0.92)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
(4 more...)

Add feedback

Active Learning for Neural Machine Translation

Vashistha, Neeraj, Singh, Kriti, Shakya, Ramakant

arXiv.org Artificial IntelligenceDec-30-2022

The machine translation mechanism translates texts automatically between different natural languages, and Neural Machine Translation (NMT) has gained attention for its rational context analysis and fluent translation accuracy. However, processing low-resource languages that lack relevant training attributes like supervised data is a current challenge for Natural Language Processing (NLP). We incorporated a technique known Active Learning with the NMT toolkit Joey NMT to reach sufficient accuracy and robust predictions of low-resource language translation. With active learning, a semi-supervised machine learning strategy, the training algorithm determines which unlabeled data would be the most beneficial for obtaining labels using selected query techniques. We implemented two model-driven acquisition functions for selecting the samples to be validated. This work uses transformer-based NMT systems; baseline model (BM), fully trained model (FTM) , active learning least confidence based model (ALLCM), and active learning margin sampling based model (ALMSM) when translating English to Hindi. The Bilingual Evaluation Understudy (BLEU) metric has been used to evaluate system results. The BLEU scores of BM, FTM, ALLCM and ALMSM systems are 16.26, 22.56 , 24.54, and 24.20, respectively. The findings in this paper demonstrate that active learning techniques helps the model to converge early and improve the overall quality of the translation system.

machine learning, natural language, neural machine translation, (2 more...)

arXiv.org Artificial Intelligence

2301.00688

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

A Discussion on Building Practical NLP Leaderboards: The Case of Machine Translation

Santy, Sebastin, Bhattacharya, Prasanta

arXiv.org Artificial IntelligenceDec-30-2022

Recent advances in AI and ML applications have benefited from rapid progress in NLP research. Leaderboards have emerged as a popular mechanism to track and accelerate progress in NLP through competitive model development. While this has increased interest and participation, the over-reliance on single, and accuracy-based metrics have shifted focus from other important metrics that might be equally pertinent to consider in real-world contexts. In this paper, we offer a preliminary discussion of the risks associated with focusing exclusively on accuracy metrics and draw on recent discussions to highlight prescriptive suggestions on how to develop more practical and effective leaderboards that can better reflect the real-world utility of models.

artificial intelligence, machine learning, natural language, (13 more...)

arXiv.org Artificial Intelligence

2106.06292

Country:

Europe > Belgium > Brussels-Capital Region > Brussels (0.05)
Asia > Indonesia > Bali (0.05)
Oceania > Australia > Victoria > Melbourne (0.04)
(10 more...)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback