AITopics | Machine Translation

Collaborating Authors

Machine Translation

"Machine translation (MT) is the application of computers to the task of translating texts from one natural language to another. One of the very earliest pursuits in computer science, MT has proved to be an elusive goal, but today a number of systems are available which produce output which, if not perfect, is of sufficient quality to be useful in a number of specific domains."
– Definition from the European Association for Machine Translation (EAMT).

You can translate text of your choice by using free translators such as: CAPITA, Google Translate, SDL International, SYSTRAN.

News Overviews Instructional Materials AI-Alerts Classics

Adapters for Altering LLM Vocabularies: What Languages Benefit the Most?

Han, HyoJung, Eriguchi, Akiko, Xu, Haoran, Hoang, Hieu, Carpuat, Marine, Khayrallah, Huda

arXiv.org Artificial IntelligenceOct-12-2024

Vocabulary adaptation, which integrates new vocabulary into pre-trained language models (LMs), enables expansion to new languages and mitigates token overfragmentation. However, existing approaches are limited by their reliance on heuristic or external embeddings. We propose VocADT, a novel method for vocabulary adaptation using adapter modules that are trained to learn the optimal linear combination of existing embeddings while keeping the model's weights fixed. VocADT offers a flexible and scalable solution without requiring external resources or language constraints. Across 11 languages--with various scripts, resource availability, and fragmentation--we demonstrate that VocADT outperforms the original Mistral model (Jiang et al., 2023) and other baselines across various multilingual tasks. We find that Latin-script languages and highly fragmented languages benefit the most from vocabulary adaptation. We further finetune the adapted model on the generative task of machine translation and find that vocabulary adaptation is still beneficial after fine-tuning and that VocADT is the most effective method. Vocabulary adaptation (or transfer)--a process of modifying a pre-trained language model (LM) to use a new vocabulary--offers several key advantages.

large language model, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2410.09644

Country:

Asia > Middle East > UAE > Abu Dhabi Emirate > Abu Dhabi (0.14)
North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
Asia > Singapore (0.04)
(9 more...)

Genre:

Research Report > Promising Solution (0.66)
Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Multimodal and Multilingual Embeddings for Large-Scale Speech Mining

Neural Information Processing SystemsOct-11-2024, 14:25:25 GMT

We present an approach to encode a speech signal into a fixed-size representation which minimizes the cosine loss with the existing massively multilingual LASER text embedding space. Sentences are close in this embedding space, independently of their language and modality, either text or audio. Using a similarity metric in that multimodal embedding space, we perform mining of audio in German, French, Spanish and English from Librivox against billions of sentences from Common Crawl. This yielded more than twenty thousand hours of aligned speech translations. To evaluate the automatically mined speech/text corpora, we train neural speech translation systems for several languages pairs.

large-scale speech mining, multimodal and multilingual embedding, spanish and english, (2 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Speech (0.91)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.91)

Add feedback

SAC: Accelerating and Structuring Self-Attention via Sparse Adaptive Connection

Neural Information Processing SystemsOct-11-2024, 07:30:39 GMT

While the self-attention mechanism has been widely used in a wide variety of tasks, it has the unfortunate property of a quadratic cost with respect to the input length, which makes it difficult to deal with long inputs. In this paper, we present a method for accelerating and structuring self-attentions: Sparse Adaptive Connection (SAC). In SAC, we regard the input sequence as a graph and attention operations are performed between linked nodes. In contrast with previous self-attention models with pre-defined structures (edges), the model learns to construct attention edges to improve task-specific performances. In this way, the model is able to select the most salient nodes and reduce the quadratic complexity regardless of the sequence length.

accelerating and structuring self-attention, sac, sparse adaptive connection, (1 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.43)

Add feedback

A Tensorized Transformer for Language Modeling

Neural Information Processing SystemsOct-11-2024, 02:35:18 GMT

Latest development of neural models has connected the encoder and decoder through a self-attention mechanism. In particular, Transformer, which is solely based on self-attention, has led to breakthroughs in Natural Language Processing (NLP) tasks. However, the multi-head attention mechanism, as a key component of Transformer, limits the effective deployment of the model to a resource-limited setting. In this paper, based on the ideas of tensor decomposition and parameters sharing, we propose a novel self-attention model (namely Multi-linear attention) with Block-Term Tensor Decomposition (BTD). We test and verify the proposed attention method on three language modeling tasks (i.e., PTB, WikiText-103 and One-billion) and a neural machine translation task (i.e., WMT-2016 English-German).

language modeling, tensorized transformer, transformer, (3 more...)

Neural Information Processing Systems

Country: Asia > Myanmar > Tanintharyi Region > Dawei (0.09)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.71)

Add feedback

Neural Machine Translation with Soft Prototype

Neural Information Processing SystemsOct-11-2024, 00:55:04 GMT

Neural machine translation models usually use the encoder-decoder framework and generate translation from left to right (or right to left) without fully utilizing the target-side global information. A few recent approaches seek to exploit the global information through two-pass decoding, yet have limitations in translation quality and model efficiency. In this work, we propose a new framework that introduces a soft prototype into the encoder-decoder architecture, which allows the decoder to have indirect access to both past and future information, such that each target word can be generated based on the better global understanding. We further provide an efficient and effective method to generate the prototype. Empirical studies on various neural machine translation tasks show that our approach brings significant improvement in generation quality over the baseline model, with little extra cost in storage and inference time, demonstrating the effectiveness of our proposed framework.

information, neural machine translation, soft prototype, (1 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)

Add feedback

TSPNet: Hierarchical Feature Learning via Temporal Semantic Pyramid for Sign Language Translation

Neural Information Processing SystemsOct-10-2024, 17:51:35 GMT

Sign language translation (SLT) aims to interpret sign video sequences into text-based natural language sentences. Sign videos consist of continuous sequences of sign gestures with no clear boundaries in between. Existing SLT models usually represent sign visual features in a frame-wise manner so as to avoid needing to explicitly segmenting the videos into isolated signs. However, these methods neglect the temporal information of signs and lead to substantial ambiguity in translation. In this paper, we explore the temporal semantic structures of sign videos to learn more discriminative features.

hierarchical feature learning, sign language translation, temporal semantic pyramid, (5 more...)

Neural Information Processing Systems

Industry: Education > Curriculum > Subject-Specific Education (0.64)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (0.75)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.64)

Add feedback

Non-Monotonic Latent Alignments for CTC-Based Non-Autoregressive Machine Translation

Neural Information Processing SystemsOct-10-2024, 15:00:24 GMT

Non-autoregressive translation (NAT) models are typically trained with the cross-entropy loss, which forces the model outputs to be aligned verbatim with the target sentence and will highly penalize small shifts in word positions. Latent alignment models relax the explicit alignment by marginalizing out all monotonic latent alignments with the CTC loss. However, they cannot handle non-monotonic alignments, which is non-negligible as there is typically global word reordering in machine translation. In this work, we explore non-monotonic latent alignments for NAT. We extend the alignment space to non-monotonic alignments to allow for the global word reordering and further consider all alignments that overlap with the target sentence.

alignment, ctc-based non-autoregressive machine translation, non-monotonic latent alignment, (4 more...)

Neural Information Processing Systems

Genre: Play > Prospect (0.68)

Technology: Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.79)

Add feedback

Incorporating BERT into Parallel Sequence Decoding with Adapters

Neural Information Processing SystemsOct-10-2024, 14:58:34 GMT

While large scale pre-trained language models such as BERT have achieved great success on various natural language understanding tasks, how to efficiently and effectively incorporate them into sequence-to-sequence models and the corresponding text generation tasks remains a non-trivial problem. In this paper, we propose to address this problem by taking two different BERT models as the encoder and decoder respectively, and fine-tuning them by introducing simple and lightweight adapter modules, which are inserted between BERT layers and tuned on the task-specific dataset. In this way, we obtain a flexible and efficient model which is able to jointly leverage the information contained in the source-side and target-side BERT models, while bypassing the catastrophic forgetting problem. Each component in the framework can be considered as a plug-in unit, making the framework flexible and task agnostic. Our framework is based on a parallel sequence decoding algorithm named Mask-Predict considering the bi-directional and conditional independent nature of BERT, and can be adapted to traditional autoregressive decoding easily.

adapter, incorporating bert, parallel sequence decoding, (4 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.38)

Add feedback

R-Drop: Regularized Dropout for Neural Networks

Neural Information Processing SystemsOct-10-2024, 14:09:34 GMT

Dropout is a powerful and widely used technique to regularize the training of deep neural networks. Though effective and performing well, the randomness introduced by dropout causes unnegligible inconsistency between training and inference. In this paper, we introduce a simple consistency training strategy to regularize dropout, namely R-Drop, which forces the output distributions of different sub models generated by dropout to be consistent with each other. Specifically, for each training sample, R-Drop minimizes the bidirectional KL-divergence between the output distributions of two sub models sampled by dropout. Theoretical analysis reveals that R-Drop reduces the above inconsistency. Experiments on \bf{5} widely used deep learning tasks ( \bf{18} datasets in total), including neural machine translation, abstractive summarization, language understanding, language modeling, and image classification, show that R-Drop is universally effective.

neural network, r-drop, regularized dropout, (6 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.99)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.61)

Add feedback

Data Diversification: A Simple Strategy For Neural Machine Translation

Neural Information Processing SystemsOct-10-2024, 12:37:24 GMT

We introduce Data Diversification: a simple but effective strategy to boost neural machine translation (NMT) performance. It diversifies the training data by using the predictions of multiple forward and backward models and then merging them with the original dataset on which the final NMT model is trained. Our method is applicable to all NMT models. It does not require extra monolingual data like back-translation, nor does it add more computations and parameters like ensembles of models. Our method achieves state-of-the-art BLEU scores of 30.7 and 43.7 in the WMT'14 English-German and English-French translation tasks, respectively.

data diversification, neural machine translation, translation task, (4 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)

Add feedback