AITopics

2305.07991

Country:

Europe > France (0.28)
Europe > Russia (0.14)
Asia > Russia (0.14)
(16 more...)

Genre:

Research Report > New Finding (0.46)
Research Report > Experimental Study (0.46)

Industry:

Law (1.00)
Government > Regional Government (1.00)
Media (0.93)
(3 more...)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.46)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.46)

arXiv.org Artificial IntelligenceOct-11-2023

DASpeech: Directed Acyclic Transformer for Fast and High-quality Speech-to-Speech Translation

Fang, Qingkai, Zhou, Yan, Feng, Yang

Direct speech-to-speech translation (S2ST) translates speech from one language into another using a single model. However, due to the presence of linguistic and acoustic diversity, the target speech follows a complex multimodal distribution, posing challenges to achieving both high-quality translations and fast decoding speeds for S2ST models. In this paper, we propose DASpeech, a non-autoregressive direct S2ST model which realizes both fast and high-quality S2ST. To better capture the complex distribution of the target speech, DASpeech adopts the two-pass architecture to decompose the generation process into two steps, where a linguistic decoder first generates the target text, and an acoustic decoder then generates the target speech based on the hidden states of the linguistic decoder. Specifically, we use the decoder of DA-Transformer as the linguistic decoder, and use FastSpeech 2 as the acoustic decoder. DA-Transformer models translations with a directed acyclic graph (DAG). To consider all potential paths in the DAG during training, we calculate the expected hidden states for each target token via dynamic programming, and feed them into the acoustic decoder to predict the target mel-spectrogram. During inference, we select the most probable path and take hidden states on that path as input to the acoustic decoder. Experiments on the CVSS Fr-En benchmark demonstrate that DASpeech can achieve comparable or even better performance than the state-of-the-art S2ST model Translatotron 2, while preserving up to 18.53x speedup compared to the autoregressive baseline. Compared with the previous non-autoregressive S2ST model, DASpeech does not rely on knowledge distillation and iterative decoding, achieving significant improvements in both translation quality and decoding speed. Furthermore, DASpeech shows the ability to preserve the speaker's voice of the source speech during translation.

daspeech, directed acyclic transformer, fast and high-quality speech-to-speech translation

2310.07403

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.89)
Information Technology > Artificial Intelligence > Speech > Speech Recognition (0.60)

Duret, Jarod, O'Brien, Benjamin, Estève, Yannick, Parcollet, Titouan

Enhancing expressivity transfer in textless speech-to-speech translation

arXiv.org Artificial IntelligenceOct-11-2023

Textless speech-to-speech translation systems are rapidly advancing, thanks to the integration of self-supervised learning techniques. However, existing state-of-the-art systems fall short when it comes to capturing and transferring expressivity accurately across different languages. Expressivity plays a vital role in conveying emotions, nuances, and cultural subtleties, thereby enhancing communication across diverse languages. To address this issue this study presents a novel method that operates at the discrete speech unit level and leverages multilingual emotion embeddings to capture language-agnostic information. Specifically, we demonstrate how these embeddings can be used to effectively predict the pitch and duration of speech units in the target language. Through objective and subjective experiments conducted on a French-to-English translation task, our findings highlight the superior expressivity transfer achieved by our approach compared to current state-of-the-art systems.

enhancing expressivity transfer, textless speech-to-speech translation

2310.07279

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.89)
Information Technology > Artificial Intelligence > Machine Learning (0.87)
Information Technology > Artificial Intelligence > Speech > Speech Recognition (0.60)

Finkelstein, Mara, Naskar, Subhajit, Mirzazadeh, Mehdi, Shah, Apurva, Freitag, Markus

MBR and QE Finetuning: Training-time Distillation of the Best and Most Expensive Decoding Methods

arXiv.org Artificial IntelligenceOct-11-2023

Recent research in decoding methods for Natural Language Generation (NLG) tasks has shown that MAP decoding is not optimal, because model probabilities do not always align with human preferences. Stronger decoding methods, including Quality Estimation (QE) reranking and Minimum Bayes' Risk (MBR) decoding, have since been proposed to mitigate the model-perplexity-vs-quality mismatch. While these decoding methods achieve state-of-the-art performance, they are prohibitively expensive to compute. In this work, we propose MBR finetuning and QE finetuning which distill the quality gains from these decoding methods at training time, while using an efficient decoding algorithm at inference time. Using the canonical NLG task of Neural Machine Translation (NMT), we show that even with self-training, these finetuning methods significantly outperform the base model. Moreover, when using an external LLM as a teacher model, these finetuning methods outperform finetuning on human-generated references. These findings suggest new ways to leverage monolingual data to achieve improvements in model quality that are on par with, or even exceed, improvements from human-curated data, while maintaining maximum efficiency during decoding.

expensive decoding method, mbr and qe finetuning, training-time distillation

2309.10966

Genre: Research Report (0.69)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.87)
Information Technology > Artificial Intelligence > Representation & Reasoning > Model-Based Reasoning (0.80)
Information Technology > Artificial Intelligence > Natural Language > Generation (0.53)

arXiv.org Artificial IntelligenceOct-10-2023

Quality Control at Your Fingertips: Quality-Aware Translation Models

Tomani, Christian, Vilar, David, Freitag, Markus, Cherry, Colin, Naskar, Subhajit, Finkelstein, Mara, Cremers, Daniel

Maximum-a-posteriori (MAP) decoding is the most widely used decoding strategy for neural machine translation (NMT) models. The underlying assumption is that model probability correlates well with human judgment, with better translations being more likely. However, research has shown that this assumption does not always hold, and decoding strategies which directly optimize a utility function, like Minimum Bayes Risk (MBR) or Quality-Aware decoding can significantly improve translation quality over standard MAP decoding. The main disadvantage of these methods is that they require an additional model to predict the utility, and additional steps during decoding, which makes the entire process computationally demanding. In this paper, we propose to make the NMT models themselves quality-aware by training them to estimate the quality of their own output. During decoding, we can use the model's own quality estimates to guide the generation process and produce the highest-quality translations possible. We demonstrate that the model can self-evaluate its own output during translation, eliminating the need for a separate quality estimation model. Moreover, we show that using this quality signal as a prompt during MAP decoding can significantly improve translation quality. When using the internal quality estimate to prune the hypothesis space during MBR decoding, we can not only further improve translation quality, but also reduce inference speed by two orders of magnitude.

computational linguistic, mbr, translation, (13 more...)

2310.06707

Country:

Europe > Spain > Catalonia > Barcelona Province > Barcelona (0.04)
Europe > Italy > Tuscany > Florence (0.04)
Asia > Middle East > UAE > Abu Dhabi Emirate > Abu Dhabi (0.04)
(18 more...)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (0.68)

Technology: Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)

Fast and Robust Early-Exiting Framework for Autoregressive Language Models with Synchronized Parallel Decoding

Bae, Sangmin, Ko, Jongwoo, Song, Hwanjun, Yun, Se-Young

To tackle the high inference latency exhibited by autoregressive language models, previous studies have proposed an early-exiting framework that allocates adaptive computation paths for each token based on the complexity of generating the subsequent token. However, we observed several shortcomings, including performance degradation caused by a state copying mechanism or numerous exit paths, and sensitivity to exit confidence thresholds. Consequently, we propose a Fast and Robust Early-Exiting (FREE) framework, which incorporates a shallow-deep module and a synchronized parallel decoding. Our framework enables faster inference by synchronizing the decoding process of the current token with previously stacked early-exited tokens. Furthermore, as parallel decoding allows us to observe predictions from both shallow and deep models, we present a novel adaptive threshold estimator that exploits a Beta mixture model to determine suitable confidence thresholds. We empirically demonstrated the superiority of our proposed framework on extensive generation tasks.

computational linguistic, dataset, threshold, (15 more...)

2310.05424

Country:

North America > United States > Hawaii > Honolulu County > Honolulu (0.04)
Europe > Italy > Tuscany > Florence (0.04)
South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
(13 more...)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.69)

Bogoychev, Nikolay, Chen, Pinzhen

Terminology-Aware Translation with Constrained Decoding and Large Language Model Prompting

Terminology correctness is important in the downstream application of machine translation, and a prevalent way to ensure this is to inject terminology constraints into a translation system. In our submission to the WMT 2023 terminology translation task, we adopt a translate-then-refine approach which can be domain-independent and requires minimal manual efforts. We annotate random source words with pseudo-terminology translations obtained from word alignment to first train a terminology-aware model. Further, we explore two post-processing methods. First, we use an alignment process to discover whether a terminology constraint has been violated, and if so, we re-decode with the violating word negatively constrained. Alternatively, we leverage a large language model to refine a hypothesis by providing it with terminology constraints. Results show that our terminology-aware model learns to incorporate terminologies effectively, and the large language model refinement process can further improve terminology recall.

constraint, proceedings, translation, (16 more...)

2310.05824

Country: Europe (0.04)

Genre: Research Report > New Finding (0.34)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Vico, Gianluca, Spanakis, Gerasimos

Larth: Dataset and Machine Translation for Etruscan

Etruscan is an ancient language spoken in Italy from the 7th century BC to the 1st century AD. There are no native speakers of the language at the present day, and its resources are scarce, as there exist only around 12,000 known inscriptions. To the best of our knowledge, there are no publicly available Etruscan corpora for natural language processing. Therefore, we propose a dataset for machine translation from Etruscan to English, which contains 2891 translated examples from existing academic sources. Some examples are extracted manually, while others are acquired in an automatic way. Along with the dataset, we benchmark different machine translation models observing that it is possible to achieve a BLEU score of 10.1 with a small transformer model. Releasing the dataset can help enable future research on this language, similar languages or other languages with scarce resources.

etruscan, sequence, translation, (15 more...)

2310.05688

Country:

Europe > Italy (0.24)
North America > United States (0.04)
Europe > Portugal > Lisbon > Lisbon (0.04)
(6 more...)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Understanding Translationese in Cross-Lingual Summarization

Wang, Jiaan, Meng, Fandong, Liang, Yunlong, Zhang, Tingyi, Xu, Jiarong, Li, Zhixu, Zhou, Jie

Given a document in a source language, cross-lingual summarization (CLS) aims at generating a concise summary in a different target language. Unlike monolingual summarization (MS), naturally occurring source-language documents paired with target-language summaries are rare. To collect large-scale CLS data, existing datasets typically involve translation in their creation. However, the translated text is distinguished from the text originally written in that language, i.e., translationese. In this paper, we first confirm that different approaches of constructing CLS datasets will lead to different degrees of translationese. Then we systematically investigate how translationese affects CLS model evaluation and performance when it appears in source documents or target summaries. In detail, we find that (1) the translationese in documents or summaries of test sets might lead to the discrepancy between human judgment and automatic evaluation; (2) the translationese in training sets would harm model performance in real-world applications; (3) though machine-translated documents involve translationese, they are very useful for building CLS systems on low-resource languages under specific training strategies. Lastly, we give suggestions for future CLS research including dataset and model developments. We hope that our work could let researchers notice the phenomenon of translationese in CLS and take it into account in the future.

computational linguistic, dataset, translationese, (14 more...)

2212.0722

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
Europe > Ireland > Leinster > County Dublin > Dublin (0.04)
Asia > China > Shanghai > Shanghai (0.04)
(21 more...)

Genre: Research Report > New Finding (0.66)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.94)

arXiv.org Artificial IntelligenceOct-8-2023

Synslator: An Interactive Machine Translation Tool with Online Learning

Wang, Jiayi, Wang, Ke, Zhou, Fengming, Wang, Chengyu, Fu, Zhiyong, Feng, Zeyu, Zhao, Yu, Zhang, Yuqi

Interactive machine translation (IMT) has emerged as a progression of the computer-aided translation paradigm, where the machine translation system and the human translator collaborate to produce high-quality translations. This paper introduces Synslator, a user-friendly computer-aided translation (CAT) tool that not only supports IMT, but is adept at online learning with real-time translation memories. To accommodate various deployment environments for CAT services, Synslator integrates two different neural translation models to handle translation memories for online learning. Additionally, the system employs a language model to enhance the fluency of translations in an interactive mode. In evaluation, we have confirmed the effectiveness of online learning through the translation models, and have observed a 13% increase in post-editing efficiency with the interactive functionalities of Synslator. A tutorial video is available at:https://youtu.be/K0vRsb2lTt8.

machine translation, synslator, translation, (16 more...)

2310.05025

Country:

North America > United States > Texas > Travis County > Austin (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Europe > Iceland > Capital Region > Reykjavik (0.04)
Asia > Indonesia > Bali (0.04)

Genre:

Instructional Material (0.68)
Research Report (0.50)

Industry: Education > Educational Setting > Online (1.00)

Technology: Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)