AITopics | Machine Translation

Collaborating Authors

Machine Translation

"Machine translation (MT) is the application of computers to the task of translating texts from one natural language to another. One of the very earliest pursuits in computer science, MT has proved to be an elusive goal, but today a number of systems are available which produce output which, if not perfect, is of sufficient quality to be useful in a number of specific domains."
– Definition from the European Association for Machine Translation (EAMT).

You can translate text of your choice by using free translators such as: CAPITA, Google Translate, SDL International, SYSTRAN.

News Overviews Instructional Materials AI-Alerts Classics

IKUN for WMT24 General MT Task: LLMs Are here for Multilingual Machine Translation

Liao, Baohao, Herold, Christian, Khadivi, Shahram, Monz, Christof

arXiv.org Artificial IntelligenceAug-29-2024

This paper introduces two multilingual systems, IKUN and IKUN-C, developed for the general machine translation task in WMT24. IKUN and IKUN-C represent an open system and a constrained system, respectively, built on Llama-3-8b and Mistral-7B-v0.3. Both systems are designed to handle all 11 language directions using a single model. According to automatic evaluation metrics, IKUN-C achieved 6 first-place and 3 second-place finishes among all constrained systems, while IKUN secured 1 first-place and 2 second-place finishes across both open and constrained systems. These encouraging results suggest that large language models (LLMs) are nearing the level of proficiency required for effective multilingual machine translation. The systems are based on a two-stage approach: first, continuous pre-training on monolingual data in 10 languages, followed by fine-tuning on high-quality parallel data for 11 language directions. The primary difference between IKUN and IKUN-C lies in their monolingual pre-training strategy. IKUN-C is pre-trained using constrained monolingual data, whereas IKUN leverages monolingual data from the OSCAR dataset. In the second phase, both systems are fine-tuned on parallel data sourced from NTREX, Flores, and WMT16-23 for all 11 language pairs.

machine translation, proceedings, translation, (13 more...)

arXiv.org Artificial Intelligence

2408.11512

Country:

Europe > Austria > Vienna (0.14)
Asia > Singapore (0.05)
Europe > Germany > Saxony > Leipzig (0.04)
(11 more...)

Genre: Research Report > New Finding (0.48)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)

Add feedback

Instruction-tuned Large Language Models for Machine Translation in the Medical Domain

Rios, Miguel

arXiv.org Artificial IntelligenceAug-29-2024

Large Language Models (LLMs) have shown promising results on machine translation for high resource language pairs and domains. However, in specialised domains (e.g. medical) LLMs have shown lower performance compared to standard neural machine translation models. The consistency in the machine translation of terminology is crucial for users, researchers, and translators in specialised domains. In this study, we compare the performance between baseline LLMs and instruction-tuned LLMs in the medical domain. In addition, we introduce terminology from specialised medical dictionaries into the instruction formatted datasets for fine-tuning LLMs. The instruction-tuned LLMs significantly outperform the baseline models with automatic metrics.

language model, machine translation, medical domain

arXiv.org Artificial Intelligence

2408.1644

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

ConCSE: Unified Contrastive Learning and Augmentation for Code-Switched Embeddings

Jeon, Jangyeong, Cho, Sangyeon, Ma, Minuk, Kim, Junyoung

arXiv.org Artificial IntelligenceAug-28-2024

This paper examines the Code-Switching (CS) phenomenon where two languages intertwine within a single utterance. There exists a noticeable need for research on the CS between English and Korean. We highlight that the current Equivalence Constraint (EC) theory for CS in other languages may only partially capture English-Korean CS complexities due to the intrinsic grammatical differences between the languages. We introduce a novel Koglish dataset tailored for English-Korean CS scenarios to mitigate such challenges. First, we constructed the Koglish-GLUE dataset to demonstrate the importance and need for CS datasets in various tasks. We found the differential outcomes of various foundation multilingual language models when trained on a monolingual versus a CS dataset. Motivated by this, we hypothesized that SimCSE, which has shown strengths in monolingual sentence embedding, would have limitations in CS scenarios. We construct a novel Koglish-NLI (Natural Language Inference) dataset using a CS augmentation-based approach to verify this. From this CS-augmented dataset Koglish-NLI, we propose a unified contrastive learning and augmentation method for code-switched embeddings, ConCSE, highlighting the semantics of CS sentences. Experimental results validate the proposed ConCSE with an average performance enhancement of 1.77\% on the Koglish-STS(Semantic Textual Similarity) tasks.

artificial intelligence, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2409.0012

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
North America > United States > Washington > King County > Seattle (0.04)
North America > United States > Pennsylvania (0.04)
(5 more...)

Genre: Research Report > New Finding (0.68)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.67)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.46)

Add feedback

Making the Most of your Model: Methods for Finetuning and Applying Pretrained Transformers

Yoshida, Davis

arXiv.org Artificial IntelligenceAug-28-2024

This thesis provides methods and analysis of models which make progress on this goal. The techniques outlined are task agnostic, and should provide benefit when used with nearly any transformer LM. We introduce two new finetuning methods which add new capabilities to the models they are used on. The first adds a recurrence mechanism, which removes the fixed-window sized constraint and improves the efficiency of a transformer decoder. The second allows masked language models (MLMs) to be used for initialization of both the encoder and decoder of a non-autoregressive sequence-to-sequence transformer, opening up generative applications of models which were previously only used for natural language understanding tasks. We also introduce two new techniques for improving the quality of predictions of any transformer decoder without additional finetuning. One, hidden state optimization, can be applied to any transformer decoder to improve the quality of predictions at inference time, especially for few-shot classification. The other, conditional beam search, allows practitioners to search for natural language generation (NLG) model outputs with high likelihood while conditioning on the event that the output is not degenerate (e.g. empty, repetitive, etc.). Finally, we provide theoretical and empirical insights on the divergence of model-likelihood and output quality which has widely been observed in prior work. These insights apply to any model which represents a distribution over text, and apply to language models which are not transformers or even autoregressive. We argue that the NLP community has, to some extent, misunderstood the implications of these findings, and encourage a point of view which has more nuance.

language modeling validation test perplexity, marianmt zh-en translation model, natural language processing, (12 more...)

arXiv.org Artificial Intelligence

2408.16241

Country:

North America > United States > Missouri > Jackson County > Kansas City (0.13)
Asia > China > Liaoning Province > Shenyang (0.04)
Asia > China > Beijing > Beijing (0.04)
(30 more...)

Genre:

Research Report > New Finding (1.00)
Overview (1.00)

Industry:

Media (1.00)
Education (0.67)
Information Technology > Security & Privacy (0.67)
(5 more...)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Search (1.00)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
(4 more...)

Add feedback

An Investigation of Warning Erroneous Chat Translations in Cross-lingual Communication

Li, Yunmeng, Suzuki, Jun, Morishita, Makoto, Abe, Kaori, Inui, Kentaro

arXiv.org Artificial IntelligenceAug-28-2024

The complexities of chats pose significant challenges for machine translation models. Recognizing the need for a precise evaluation metric to address the issues of chat translation, this study introduces Multidimensional Quality Metrics for Chat Translation (MQM-Chat). Through the experiments of five models using MQM-Chat, we observed that all models generated certain fundamental errors, while each of them has different shortcomings, such as omission, overly correcting ambiguous source content, and buzzword issues, resulting in the loss of stylized information. Our findings underscore the effectiveness of MQM-Chat in evaluating chat translation, emphasizing the importance of stylized content and dialogue consistency for future studies.

participant, translation, warning message, (11 more...)

arXiv.org Artificial Intelligence

doi: 10.18653/v1/2023.ijcnlp-srw.2

2408.15543

Country:

Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.14)
Europe > Belgium (0.05)
Oceania > Australia > Victoria > Melbourne (0.04)
(6 more...)

Genre: Research Report > New Finding (0.86)

Industry: Information Technology (0.68)

Technology: Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)

Add feedback

From Rule-Based Models to Deep Learning Transformers Architectures for Natural Language Processing and Sign Language Translation Systems: Survey, Taxonomy and Performance Evaluation

Shahin, Nada, Ismail, Leila

arXiv.org Artificial IntelligenceAug-27-2024

With the growing Deaf and Hard of Hearing population worldwide and the persistent shortage of certified sign language interpreters, there is a pressing need for an efficient, signs-driven, integrated end-to-end translation system, from sign to gloss to text and vice-versa. There has been a wealth of research on machine translations and related reviews. However, there are few works on sign language machine translation considering the particularity of the language being continuous and dynamic. This paper aims to address this void, providing a retrospective analysis of the temporal evolution of sign language machine translation algorithms and a taxonomy of the Transformers architectures, the most used approach in language translation. We also present the requirements of a real-time Quality-of-Service sign language ma-chine translation system underpinned by accurate deep learning algorithms. We propose future research directions for sign language translation systems.

dataset, sign language, translation, (11 more...)

arXiv.org Artificial Intelligence

2408.14825

Country:

North America > United States > New York > New York County > New York City (0.14)
Europe > Austria > Vienna (0.14)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
(8 more...)

Genre: Research Report > New Finding (0.66)

Industry:

Information Technology > Security & Privacy (1.00)
Education > Curriculum > Subject-Specific Education (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

How to Train Text Summarization Model with Weak Supervisions

Wang, Yanbo, Chen, Wenyu, Shan, Shimin

arXiv.org Artificial IntelligenceAug-26-2024

Currently, machine learning techniques have seen significant success across various applications. Most of these techniques rely on supervision from human-generated labels or a mixture of noisy and imprecise labels from multiple sources. However, for certain complex tasks, even noisy or inexact labels are unavailable due to the intricacy of the objectives. To tackle this issue, we propose a method that breaks down the complex objective into simpler tasks and generates supervision signals for each one. We then integrate these supervision signals into a manageable form, resulting in a straightforward learning procedure. As a case study, we demonstrate a system used for topic-based summarization. This system leverages rich supervision signals to promote both summarization and topic relevance. Remarkably, we can train the model end-to-end without any labels. Experimental results indicate that our approach performs exceptionally well on the CNN and DailyMail datasets.

arxiv preprint arxiv, summarization, supervision, (14 more...)

arXiv.org Artificial Intelligence

2409.00098

Country:

Asia > China (0.04)
North America > United States > Virginia (0.04)
Europe > Spain > Galicia > Madrid (0.04)
(3 more...)

Genre: Research Report (1.00)

Industry:

Leisure & Entertainment > Sports > Soccer (1.00)
Health & Medicine (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.69)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.68)

Add feedback

Guardians of the Machine Translation Meta-Evaluation: Sentinel Metrics Fall In!

Perrella, Stefano, Proietti, Lorenzo, Scirè, Alessandro, Barba, Edoardo, Navigli, Roberto

arXiv.org Artificial IntelligenceAug-25-2024

Annually, at the Conference of Machine Translation (WMT), the Metrics Shared Task organizers conduct the meta-evaluation of Machine Translation (MT) metrics, ranking them according to their correlation with human judgments. Their results guide researchers toward enhancing the next generation of metrics and MT systems. With the recent introduction of neural metrics, the field has witnessed notable advancements. Nevertheless, the inherent opacity of these metrics has posed substantial challenges to the meta-evaluation process. This work highlights two issues with the meta-evaluation framework currently employed in WMT, and assesses their impact on the metrics rankings. To do this, we introduce the concept of sentinel metrics, which are designed explicitly to scrutinize the meta-evaluation process's accuracy, robustness, and fairness. By employing sentinel metrics, we aim to validate our findings, and shed light on and monitor the potential biases or inconsistencies in the rankings. We discover that the present meta-evaluation framework favors two categories of metrics: i) those explicitly trained to mimic human quality assessments, and ii) continuous metrics. Finally, we raise concerns regarding the evaluation capabilities of state-of-the-art metrics, emphasizing that they might be basing their assessments on spurious correlations found in their training data.

machine learning, metric, natural language, (15 more...)

arXiv.org Artificial Intelligence

2408.13831

Country:

Asia > Singapore (0.04)
Asia > Middle East > UAE > Abu Dhabi Emirate > Abu Dhabi (0.04)
Europe > Portugal > Lisbon > Lisbon (0.04)
(7 more...)

Genre: Research Report > New Finding (0.34)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Bidirectional Awareness Induction in Autoregressive Seq2Seq Models

Hu, Jia Cheng, Cavicchioli, Roberto, Capotondi, Alessandro

arXiv.org Artificial IntelligenceAug-25-2024

Autoregressive Sequence-To-Sequence models are the foundation of many Deep Learning achievements in major research fields such as Vision and Natural Language Processing. Despite that, they still present significant limitations. For instance, when errors occur in the early steps of the prediction, the whole output is severely affected. Such reliance on previously predicted tokens and the inherent computational unfriendliness of sequential algorithms, motivated researchers to explore different architectures and methods in the search for bidirectional approaches. In this work, we introduce the Bidirectional Awareness Induction (BAI), a training method that leverages a subset of elements in the network, the Pivots, to perform bidirectional learning without breaking the autoregressive constraints. To showcase its flexibility, we apply the method to three architectures, the Transformer, ExpansionNet v2 and GPT, then perform experiments over three tasks. Experimental results showcase BAI's effectiveness on all selected tasks and architectures. In particular, we observed an increase of up to 2.4 CIDEr in Image-Captioning, 4.96 BLEU in Neural Machine Translation, and 1.16 ROUGE in Text Summarization compared to the respective baselines. Notably, BAI not only has a positive impact on models trained from scratch but on pre-trained models as well. Such an aspect, combined with the absence of architectural requirements synergizes well with the current trend of LLMs.

architecture, bai, bidirectional awareness induction, (9 more...)

arXiv.org Artificial Intelligence

2408.13959

Country:

Europe > Switzerland (0.04)
Europe > Italy > Emilia-Romagna > Modeno Province > Modena (0.04)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.95)

Add feedback

Path-Consistency: Prefix Enhancement for Efficient Inference in LLM

Zhu, Jiace, Shen, Yingtao, Zhao, Jie, Zou, An

arXiv.org Artificial IntelligenceAug-24-2024

To enhance the reasoning capabilities of large language models (LLMs), self-consistency has gained significant popularity by combining multiple sampling with majority voting. However, the state-of-the-art self-consistency approaches consume substantial computational resources and lead to significant additional time costs due to the multiple sampling. This prevents its full potential from being realized in scenarios where computational resources are critical. To improve the inference efficiency, this paper introduces \textit{path-consistency}, a method that leverages the confidence of answers generated in earlier branches to identify the prefix of the most promising path. By dynamically guiding the generation of subsequent branches based on this prefix, the \textit{path-consistency} mitigates both the errors and redundancies from random or less useful sampling in self-consistency. As a result, it can significantly accelerate the inference process by reducing the number of tokens generated. Our extensive empirical evaluation shows that the \textit{path-consistency} achieves significant acceleration in inference latency ranging from $7.8\%$ to $40.5\%$, while maintaining or even improving task accuracy across different datasets, including mathematical reasoning, common sense reasoning, symbolic reasoning, and code generation.

reasoning, threshold, translation, (16 more...)

arXiv.org Artificial Intelligence

2409.01281

Country:

North America > United States > Pennsylvania (0.04)
Europe > Poland > Pomerania Province (0.04)
Asia > Vietnam (0.04)
Asia > China > Shanghai > Shanghai (0.04)

Genre: Research Report (0.82)

Industry: Leisure & Entertainment (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.93)
Information Technology > Artificial Intelligence > Cognitive Science > Problem Solving (0.88)
Information Technology > Artificial Intelligence > Representation & Reasoning > Commonsense Reasoning (0.68)

Add feedback