AITopics

2410.07054

Country:

Asia > China (0.28)
North America > United States > Pennsylvania (0.28)
North America > Canada (0.14)
(6 more...)

Genre: Research Report > New Finding (0.92)

Industry: Energy > Oil & Gas > Upstream (0.34)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Qian, Shenbin, Sindhujan, Archchana, Kabra, Minnie, Kanojia, Diptesh, Orăsan, Constantin, Ranasinghe, Tharindu, Blain, Frédéric

What do Large Language Models Need for Machine Translation Evaluation?

arXiv.org Artificial IntelligenceOct-9-2024

Leveraging large language models (LLMs) for various natural language processing tasks has led to superlative claims about their performance. For the evaluation of machine translation (MT), existing research shows that LLMs are able to achieve results comparable to fine-tuned multilingual pre-trained language models. In this paper, we explore what translation information, such as the source, reference, translation errors and annotation guidelines, is needed for LLMs to evaluate MT quality. In addition, we investigate prompting techniques such as zero-shot, Chain of Thought (CoT) and few-shot prompting for eight language pairs covering high-, medium- and low-resource languages, leveraging varying LLM variants. Our findings indicate the importance of reference translations for an LLM-based evaluation. While larger models do not necessarily fare better, they tend to benefit more from CoT prompting, than smaller models. We also observe that LLMs do not always provide a numerical score when generating evaluations, which poses a question on their reliability for the task. Our work presents a comprehensive analysis for resource-constrained and training-less LLM-based evaluation of machine translation. We release the accrued prompt templates, code and data publicly for reproducibility.

evaluation, language pair, translation, (14 more...)

2410.03278

Country:

Asia > Singapore (0.04)
Asia > Middle East > UAE > Abu Dhabi Emirate > Abu Dhabi (0.04)
Europe > United Kingdom > England > Surrey (0.04)
(15 more...)

Genre: Research Report > New Finding (0.88)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)

Neural Information Processing SystemsOct-8-2024, 08:48:45 GMT

Reviews: Deliberation Networks: Sequence Generation Beyond One-Pass Decoding

Two of my major concerns: the weakness of the baseline and the lack of comparison with automatic post-editing have been resolved by the response. I've raised my evaluation with the expectation that these results will be added to the final camera ready version. With regards to the examples, the reason why I said "cherry-picked?" (with a question mark) was because there was no mention of how the examples were chosen. If they were chosen randomly or some other unbiased method that could be noted in the paper. It's OK to cherry-pick representative examples, of course, and it'd be more clear if this was mentioned as well.

deliberation network, neural machine translation, sequence generation, (8 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.43)

Neural Information Processing SystemsOct-8-2024, 07:56:58 GMT

Reviews: Generative Neural Machine Translation

Summary This paper proposes a generative latent variable model for neural machine translation, where inference is performed with variational inference. This extends the work of Zhang et al., 2016, who proposed a conditional model with variational inference. The advantage of the generative model is to force the latent variable to capture more of the semantics of the sentence than the conditional model was able to do. The main disadvantage of this approach is that the value of the latent variable has to be infered during decoding (based on candidate generations). The paper also shows that a version of this model can be trained in a multilingual setting, that monolingual data can be used as semi-supervised training, and that the inference algorithm can be extended to perform translation with missing words.

generative neural machine translation, inference, latent variable, (6 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)

Neural Information Processing SystemsOct-8-2024, 07:38:06 GMT

Reviews: One-Shot Imitation Learning

Summary --- Complex and useful robotic manipulation tasks are difficult because of the difficulty of manipulation itself, but also because it's difficult to communicate the intent of a task. Both of these problems can be alleviated through the use of imitation learning, but in order for this to be practical the learner must be able to generalize from few examples. This paper presents an architecture inspired by recent work in meta learning which generalizes manipulation of a robot arm from a single task demonstration; i.e., it does one-shot imitation learning. The network is something like a seq2seq model that uses multiple attention mechanisms in the style of "Neural Machine Translation by Jointly Learning to Align and Translate". There is a demonstration network, a context network and a manipulation network.

attention mechanism, demonstration, one-shot imitation learning, (12 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.91)

Allemann, Alexis, Atrio, Àlex R., Popescu-Belis, Andrei

Optimizing the Training Schedule of Multilingual NMT using Reinforcement Learning

Multilingual NMT is a viable solution for translating low-resource languages (LRLs) when data from high-resource languages (HRLs) from the same language family is available. However, the training schedule, i.e. the order of presentation of languages, has an impact on the quality of such systems. Here, in a many-to-one translation setting, we propose to apply two algorithms that use reinforcement learning to optimize the training schedule of NMT: (1) Teacher-Student Curriculum Learning and (2) Deep Q Network. The former uses an exponentially smoothed estimate of the returns of each action based on the loss on monolingual or multilingual development subsets, while the latter estimates rewards using an additional neural network trained from the history of actions selected in different states of the system, together with the rewards received. On a 8-to-1 translation dataset with LRLs and HRLs, our second method improves BLEU and COMET scores with respect to both random selection of monolingual batches and shuffled multilingual batches, by adjusting the number of presentations of LRL vs. HRL batches.

algorithm, computational linguistic, machine translation, (13 more...)

2410.06118

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
Europe > Belgium > Brussels-Capital Region > Brussels (0.04)
(10 more...)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Qian, Shenbin, Orăsan, Constantin, Kanojia, Diptesh, Carmo, Félix do

Are Large Language Models State-of-the-art Quality Estimators for Machine Translation of User-generated Content?

This paper investigates whether large language models (LLMs) are state-of-the-art quality estimators for machine translation of user-generated content (UGC) that contains emotional expressions, without the use of reference translations. To achieve this, we employ an existing emotion-related dataset with human-annotated errors and calculate quality evaluation scores based on the Multi-dimensional Quality Metrics. We compare the accuracy of several LLMs with that of our fine-tuned baseline models, under in-context learning and parameter-efficient fine-tuning (PEFT) scenarios. We find that PEFT of LLMs leads to better performance in score prediction with human interpretable explanations than fine-tuned models. However, a manual analysis of LLM outputs reveals that they still have problems such as refusal to reply to a prompt and unstable output while evaluating machine translation of UGC.

emotion preservation, machine translation, translation, (10 more...)

2410.06338

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)
Asia > China > Hong Kong (0.05)
Europe > Finland > Pirkanmaa > Tampere (0.04)
(6 more...)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.98)

Carmo, Félix do, Kanojia, Diptesh

Edit Distances and Their Applications to Downstream Tasks in Research and Commercial Contexts

Edit distances are a class of metrics used to quantify the similarity between two text sequences by calculating the minimum number of operations required to transform one sequence into another. These operations typically include insertion, deletion, substitution, and movement of characters or words. The application of edit distances extends beyond simple string comparison and is used extensively in evaluating machinetranslated text against human references, quality estimation, and post-editing tasks. This tutorial is targeted at researchers of machine translation and of human translation, as well as corporate members of AMTA. It focuses on the uses of edit distances, such as TER - Translation Edit Rate (Snover et al., 2006), as proxies of translation effort and as informants of other downstream tasks, such as MT evaluation and post-editing, error annotation with MQM (Burchardt, 2013), quality estimation - QE (Specia et al., 2022) and automatic post-editing - APE (do Carmo et al., 2021). The application of edit distances in downstream tasks often assumes that these accurately represent work done by post-editors and real errors that need to be corrected in MT output. We will discuss how imperfect edit distances are in capturing the details of this error correction work and the implications for researchers and for commercial applications of these uses of edit distances. In terms of commercial applications, we will discuss their integration in computer-assisted translation tools and how the perception of the connection between edit distances and post-editor effort affects the definition of translator rates.

application, edit distance, tutorial, (11 more...)

2410.05881

Country:

Europe > United Kingdom > England > Surrey (0.05)
Europe > Portugal > Porto > Porto (0.05)

Genre: Instructional Material > Course Syllabus & Notes (1.00)

Industry: Information Technology (0.58)

Technology: Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)

Berger, Nathaniel, Riezler, Stefan, Exel, Miriam, Huck, Matthias

Post-edits Are Preferences Too

Preference Optimization (PO) techniques are currently one of the state of the art techniques for fine-tuning large language models (LLMs) on pairwise preference feedback from human annotators. However, in machine translation, this sort of feedback can be difficult to solicit. Additionally, Kreutzer et al. (2018) have shown that, for machine translation, pairwise preferences are less reliable than other forms of human feedback, such as 5-point ratings. We examine post-edits to see if they can be a source of reliable human preferences by construction. In PO, a human annotator is shown sequences $s_1$ and $s_2$ and asked for a preference judgment, %$s_1 > s_2$; while for post-editing, editors create $s_1$ and know that it should be better than $s_2$. We attempt to use these implicit preferences for PO and show that it helps the model move towards post-edit-like hypotheses and away from machine translation-like hypotheses. Furthermore, we show that best results are obtained by pre-training the model with supervised fine-tuning (SFT) on post-edits in order to promote post-edit-like hypotheses to the top output ranks.

machine translation, probability, sequence, (13 more...)

2410.0232

Country:

Asia > Singapore (0.04)
Oceania > Australia > Victoria > Melbourne (0.04)
North America > United States > Pennsylvania > Philadelphia County > Philadelphia (0.04)
(6 more...)

Genre:

Research Report > New Finding (0.46)
Research Report > Promising Solution (0.34)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Buettner, Kyle, Kovashka, Adriana

Quantifying the Gaps Between Translation and Native Perception in Training for Multimodal, Multilingual Retrieval

There is a scarcity of multilingual visionlanguage models that properly account for the perceptual differences that are reflected in image captions across languages and cultures. In this work, through a multimodal, multilingual retrieval case study, we quantify the existing lack of model flexibility. We empirically show Figure 1: Example perception differences between native performance gaps between training on captions English and German speakers. Examples are captions that come from native German perception from Flickr30K (Young et al., 2014) and Multi30K and captions that have been either machinetranslated (Elliott et al., 2016). Note differences in mentioned objects or human-translated from English ("sand arena", "parasol") and specificity ("Heurigen into German. To address these gaps, we further bench" vs. "table", "horse" vs. "bronco"). German propose and evaluate caption augmentation captions here are translated to English.

caption, perception, translation, (15 more...)

2410.02027

Country:

Europe > Switzerland > Zürich > Zürich (0.14)
North America > Canada > Ontario > Toronto (0.04)
Oceania > Australia > Victoria > Melbourne (0.04)
(3 more...)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.70)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.70)