AITopics

2303.01793

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
Europe > United Kingdom > Scotland > City of Edinburgh > Edinburgh (0.04)
Europe > Belgium > Brussels-Capital Region > Brussels (0.04)
(21 more...)

Genre:

Research Report > Promising Solution (0.86)
Overview > Innovation (0.54)

Technology: Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)

Sadrizadeh, Sahar, Aghdam, AmirHossein Dabiri, Dolamic, Ljiljana, Frossard, Pascal

Targeted Adversarial Attacks against Neural Machine Translation

Neural Machine Translation (NMT) systems are used in various applications. However, it has been shown that they are vulnerable to very small perturbations of their inputs, known as adversarial attacks. In this paper, we propose a new targeted adversarial attack against NMT models. In particular, our goal is to insert a predefined target keyword into the translation of the adversarial sentence while maintaining similarity between the original sentence and the perturbed one in the source domain. To this aim, we propose an optimization problem, including an adversarial loss term and a similarity term. We use gradient projection in the embedding space to craft an adversarial sentence. Experimental results show that our attack outperforms Seq2Sick, the other targeted adversarial attack against NMT models, in terms of success rate and decrease in translation quality. Our attack succeeds in inserting a keyword into the translation for more than 75% of sentences while similarity with the original sentence stays preserved.

machine learning, natural language, translation, (18 more...)

2303.01068

Country:

North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
Asia > Middle East > Iran > Tehran Province > Tehran (0.04)
Oceania > Australia > Victoria > Melbourne (0.04)
(12 more...)

Genre: Research Report > New Finding (0.34)

Industry:

Information Technology > Security & Privacy (1.00)
Government > Military (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Viterbi Decoding of Directed Acyclic Transformer for Non-Autoregressive Machine Translation

Shao, Chenze, Ma, Zhengrui, Feng, Yang

Non-autoregressive models achieve significant decoding speedup in neural machine translation but lack the ability to capture sequential dependency. Directed Acyclic Transformer (DA-Transformer) was recently proposed to model sequential dependency with a directed acyclic graph. Consequently, it has to apply a sequential decision process at inference time, which harms the global translation accuracy. In this paper, we present a Viterbi decoding framework for DA-Transformer, which guarantees to find the joint optimal solution for the translation and decoding path under any length constraint. Experimental results demonstrate that our approach consistently improves the performance of DA-Transformer while maintaining a similar decoding speedup.

artificial intelligence, natural language, translation, (15 more...)

2210.05193

Country:

Europe > Belgium > Brussels-Capital Region > Brussels (0.05)
North America > Dominican Republic (0.04)
Asia > China > Hong Kong (0.04)
(8 more...)

Genre: Research Report > New Finding (0.34)

Technology: Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)

Letz Translate: Low-Resource Machine Translation for Luxembourgish

Song, Yewei, Ezzini, Saad, Klein, Jacques, Bissyande, Tegawende, Lefebvre, Clément, Goujon, Anne

Natural language processing of Low-Resource Languages (LRL) is often challenged by the lack of data. Therefore, achieving accurate machine translation (MT) in a low-resource environment is a real problem that requires practical solutions. Research in multilingual models have shown that some LRLs can be handled with such models. However, their large size and computational needs make their use in constrained environments (e.g., mobile/IoT devices or limited/old servers) impractical. In this paper, we address this problem by leveraging the power of large multilingual MT models using knowledge distillation. Knowledge distillation can transfer knowledge from a large and complex teacher model to a simpler and smaller student model without losing much in performance. We also make use of high-resource languages that are related or share the same linguistic root as the target LRL. For our evaluation, we consider Luxembourgish as the LRL that shares some roots and properties with German. We build multiple resource-efficient models based on German, knowledge distillation from the multilingual No Language Left Behind (NLLB) model, and pseudo-translation. We find that our efficient models are more than 30\% faster and perform only 4\% lower compared to the large state-of-the-art NLLB model.

artificial intelligence, distillation, natural language, (15 more...)

2303.01347

Country:

Europe > Switzerland (0.04)
Europe > Portugal > Lisbon > Lisbon (0.04)
Europe > Netherlands (0.04)
Europe > Finland (0.04)

Genre: Research Report (1.00)

Technology: Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)

Banerjee, Tamali, Murthy, Rudra V, Bhattacharyya, Pushpak

Denoising-based UNMT is more robust to word-order divergence than MASS-based UNMT

We aim to investigate whether UNMT approaches with self-supervised pre-training are robust to word-order divergence between language pairs. We achieve this by comparing two models pre-trained with the same self-supervised pre-training objective. The first model is trained on language pairs with different word-orders, and the second model is trained on the same language pairs with source language re-ordered to match the word-order of the target language. Ideally, UNMT approaches which are robust to word-order divergence should exhibit no visible performance difference between the two configurations. In this paper, we investigate two such self-supervised pre-training based UNMT approaches, namely Masked Sequence-to-Sequence Pre-Training, (MASS) (which does not have shuffling noise) and Denoising AutoEncoder (DAE), (which has shuffling noise). We experiment with five English$\rightarrow$Indic language pairs, i.e., en-hi, en-bn, en-gu, en-kn, and en-ta) where word-order of the source language is SVO (Subject-Verb-Object), and the word-order of the target languages is SOV (Subject-Object-Verb). We observed that for these language pairs, DAE-based UNMT approach consistently outperforms MASS in terms of translation accuracies. Moreover, bridging the word-order gap using reordering improves the translation accuracy of MASS-based UNMT models, while it cannot improve the translation accuracy of DAE-based UNMT models. This observation indicates that DAE-based UNMT is more robust to word-order divergence than MASS-based UNMT. Word-shuffling noise in DAE approach could be the possible reason for the approach being robust to word-order divergence.

machine learning, natural language, translation, (16 more...)

2303.01191

Country:

Asia > India (0.06)
Europe > Belgium > Brussels-Capital Region > Brussels (0.04)
Europe > Ireland > Leinster > County Dublin > Dublin (0.04)
(2 more...)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

TranSpeech: Speech-to-Speech Translation With Bilateral Perturbation

Huang, Rongjie, Liu, Jinglin, Liu, Huadai, Ren, Yi, Zhang, Lichao, He, Jinzheng, Zhao, Zhou

Specifically, a sequence of discrete representations derived in a self-supervised manner are predicted from the model and passed to a vocoder for speech reconstruction, while still facing the following challenges: 1) Acoustic multimodality: the discrete units derived from speech with same content could be indeterministic due to the acoustic property (e.g., rhythm, pitch, and energy), which causes deterioration of translation accuracy; 2) high latency: current S2ST systems utilize autoregressive models which predict each unit conditioned on the sequence previously generated, failing to take full advantage of parallelism. In this work, we propose TranSpeech, a speech-to-speech translation model with bilateral perturbation. To alleviate the acoustic multimodal problem, we propose bilateral perturbation (BiP), which consists of the style normalization and information enhancement stages, to learn only the linguistic information from speech samples and generate more deterministic representations. With reduced multimodality, we step forward and become the first to establish a non-autoregressive S2ST technique, which repeatedly masks and predicts unit choices and produces high-accuracy results in just a few cycles. Experimental results on three language pairs demonstrate that BiP yields an improvement of 2.9 BLEU on average compared with a baseline textless S2ST model. Moreover, our parallel decoding shows a significant reduction of inference latency, enabling speedup up to 21.4x than autoregressive technique. Speech-to-speech translation (S2ST) aims at converting speech from one language into speech in another, significantly breaking down communication barriers between people not sharing a common language. Among the conventional method (Lavie et al., 1997; Nakamura et al., 2006; Wahlster, 2013), the cascaded system of automatic speech recognition (ASR), machine translation (MT), or speech-to-text translation (S2T) followed by text-to-speech synthesis (TTS) have demonstrated reasonable results yet suffering from expensive computational costs.

artificial intelligence, arxiv preprint arxiv, natural language, (16 more...)

2205.12523

Country:

Asia > China (0.04)
North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Speech > Speech Recognition (1.00)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)

arXiv.org Artificial IntelligenceMar-1-2023

Understanding Natural Language Understanding Systems. A Critical Analysis

Lenci, Alessandro

The development of machines that {\guillemotleft}talk like us{\guillemotright}, also known as Natural Language Understanding (NLU) systems, is the Holy Grail of Artificial Intelligence (AI), since language is the quintessence of human intelligence. The brief but intense life of NLU research in AI and Natural Language Processing (NLP) is full of ups and downs, with periods of high hopes that the Grail is finally within reach, typically followed by phases of equally deep despair and disillusion. But never has the trust that we can build {\guillemotleft}talking machines{\guillemotright} been stronger than the one engendered by the last generation of NLU systems. But is it gold all that glitters in AI? do state-of-the-art systems possess something comparable to the human knowledge of language? Are we at the dawn of a new era, in which the Grail is finally closer to us? In fact, the latest achievements of AI systems have sparkled, or better renewed, an intense scientific debate on their true language understanding capabilities. Some defend the idea that, yes, we are on the right track, despite the limits that computational models still show. Others are instead radically skeptic and even dismissal: The present limits are not just contingent and temporary problems of NLU systems, but the sign of the intrinsic inadequacy of the epistemological and technological paradigm grounding them. This paper aims at contributing to such debate by carrying out a critical analysis of the linguistic abilities of the most recent NLU systems. I contend that they incorporate important aspects of the way language is learnt and processed by humans, but at the same time they lack key interpretive and inferential skills that it is unlikely they can attain unless they are integrated with structured knowledge and the ability to exploit it for language use.

large language model, machine learning, natural language, (25 more...)

2303.04229

Country:

Oceania > Australia (0.04)
North America > United States > New York (0.04)
North America > United States > Indiana > Bartholomew County > Columbus (0.04)
(3 more...)

Genre: Research Report (0.50)

Industry:

Media > News (0.46)
Health & Medicine > Therapeutic Area (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
(3 more...)

Zhang, Shaolei, Feng, Yang

Hidden Markov Transformer for Simultaneous Machine Translation

arXiv.org Artificial IntelligenceMar-1-2023

Simultaneous machine translation (SiMT) outputs the target sequence while receiving the source sequence, and hence learning when to start translating each target token is the core challenge for SiMT task. However, it is non-trivial to learn the optimal moment among many possible moments of starting translating, as the moments of starting translating always hide inside the model and can only be supervised with the observed target sequence. In this paper, we propose a Hidden Markov Transformer (HMT), which treats the moments of starting translating as hidden events and the target sequence as the corresponding observed events, thereby organizing them as a hidden Markov model. HMT explicitly models multiple moments of starting translating as the candidate hidden events, and then selects one to generate the target token. During training, by maximizing the marginal likelihood of the target sequence over multiple moments of starting translating, HMT learns to start translating at the moments that target tokens can be generated more accurately. Recently, with the increase of real-time scenarios such as live broadcasting, video subtitles and conferences, simultaneous machine translation (SiMT) attracts more attention (Cho & Esipova, 2016; Gu et al., 2017; Ma et al., 2019; Arivazhagan et al., 2019), which requires the model to receive source token one by one and simultaneously generates the target tokens. For the purpose of high-quality translation under low latency, SiMT model needs to learn when to start translating each target token (Gu et al., 2017), thereby making a wise decision between waiting for the next source token (i.e., READ action) and generating a target token (i.e., WRITE action) during the translation process. However, learning when to start translating target tokens is not trivial for a SiMT model, as the moments of starting translating always hide inside the model and we can only supervise the SiMT model with the observed target sequence (Zhang & Feng, 2022a).

artificial intelligence, machine learning, natural language, (16 more...)

2303.00257

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
North America > Dominican Republic (0.04)
Europe > Italy > Tuscany > Florence (0.04)
(13 more...)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)

Sharami, J. Pourmostafa Roshan, Shterionov, D., Spronck, P.

A Systematic Analysis of Vocabulary and BPE Settings for Optimal Fine-tuning of NMT: A Case Study of In-domain Translation

arXiv.org Artificial IntelligenceMar-1-2023

The effectiveness of Neural Machine Translation (NMT) models largely depends on the vocabulary used at training; small vocabularies can lead to out-of-vocabulary problems -- large ones, to memory issues. Subword (SW) tokenization has been successfully employed to mitigate these issues. The choice of vocabulary and SW tokenization has a significant impact on both training and fine-tuning an NMT model. Fine-tuning is a common practice in optimizing an MT model with respect to new data. However, new data potentially introduces new words (or tokens), which, if not taken into consideration, may lead to suboptimal performance. In addition, the distribution of tokens in the new data can differ from the distribution of the original data. As such, the original SW tokenization model could be less suitable for the new data. Through a systematic empirical evaluation, in this work we compare different strategies for SW tokenization and vocabulary generation with the ultimate goal to uncover an optimal setting for fine-tuning a domain-specific model. Furthermore, we developed several (in-domain) models, the best of which achieves 6 BLEU points improvement over the baseline.

artificial intelligence, machine translation, natural language, (14 more...)

2303.00722

Country:

Europe > Portugal > Lisbon > Lisbon (0.04)
Europe > Netherlands (0.04)
Asia > Vietnam > Da Nang > Da Nang (0.04)
(10 more...)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Technology: Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)

Ramesh, Krithika, Sitaram, Sunayana, Choudhury, Monojit

Fairness in Language Models Beyond English: Gaps and Challenges

arXiv.org Artificial IntelligenceFeb-28-2023

With language models becoming increasingly ubiquitous, it has become essential to address their inequitable treatment of diverse demographic groups and factors. Most research on evaluating and mitigating fairness harms has been concentrated on English, while multilingual models and non-English languages have received comparatively little attention. This paper presents a survey of fairness in multilingual and non-English contexts, highlighting the shortcomings of current research and the difficulties faced by methods designed for English. We contend that the multitude of diverse cultures and languages across the world makes it infeasible to achieve comprehensive coverage in terms of constructing fairness datasets. Thus, the measurement and mitigation of biases must evolve beyond the current dataset-driven practices that are narrowly focused on specific dimensions and types of biases and, therefore, impossible to scale across languages and cultures.

computational linguistic, large language model, natural language, (19 more...)

2302.12578

Country:

North America > United States > Washington > King County > Seattle (0.14)
Asia > Middle East > UAE > Abu Dhabi Emirate > Abu Dhabi (0.14)
Europe > Ireland > Leinster > County Dublin > Dublin (0.05)
(23 more...)

Genre:

Research Report (1.00)
Overview (1.00)

Industry: Health & Medicine (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.46)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.46)