AITopics

2205.11616

Country:

North America > United States > Wisconsin > Dane County > Madison (0.14)
Europe > Denmark > Capital Region > Copenhagen (0.04)
Oceania > Australia > Victoria > Melbourne (0.04)
(3 more...)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Vision (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)

Martins, Pedro Henrique, Marinho, Zita, Martins, André F. T.

Chunk-based Nearest Neighbor Machine Translation

arXiv.org Artificial IntelligenceNov-7-2022

Semi-parametric models, which augment generation with retrieval, have led to impressive results in language modeling and machine translation, due to their ability to retrieve fine-grained information from a datastore of examples. One of the most prominent approaches, $k$NN-MT, exhibits strong domain adaptation capabilities by retrieving tokens from domain-specific datastores \citep{khandelwal2020nearest}. However, $k$NN-MT requires an expensive retrieval operation for every single generated token, leading to a very low decoding speed (around 8 times slower than a parametric model). In this paper, we introduce a \textit{chunk-based} $k$NN-MT model which retrieves chunks of tokens from the datastore, instead of a single token. We propose several strategies for incorporating the retrieved chunks into the generation process, and for selecting the steps at which the model needs to search for neighbors in the datastore. Experiments on machine translation in two settings, static and ``on-the-fly'' domain adaptation, show that the chunk-based $k$NN-MT model leads to significant speed-ups (up to 4 times) with only a small drop in translation quality.

artificial intelligence, natural language, translation, (15 more...)

2205.1223

Country:

Europe > Portugal > Lisbon > Lisbon (0.04)
Asia > Middle East > Jordan (0.04)

Genre:

Research Report (0.50)
Workflow (0.46)

Industry: Health & Medicine > Therapeutic Area > Immunology (0.67)

Technology: Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)

GTrans: Grouping and Fusing Transformer Layers for Neural Machine Translation

Yang, Jian, Yin, Yuwei, Yang, Liqun, Ma, Shuming, Huang, Haoyang, Zhang, Dongdong, Wei, Furu, Li, Zhoujun

Transformer structure, stacked by a sequence of encoder and decoder network layers, achieves significant development in neural machine translation. However, vanilla Transformer mainly exploits the top-layer representation, assuming the lower layers provide trivial or redundant information and thus ignoring the bottom-layer feature that is potentially valuable. In this work, we propose the Group-Transformer model (GTrans) that flexibly divides multi-layer representations of both encoder and decoder into different groups and then fuses these group features to generate target words. To corroborate the effectiveness of the proposed method, extensive experiments and analytic experiments are conducted on three bilingual translation benchmarks and two multilingual translation tasks, including the IWLST-14, IWLST-17, LDC, WMT-14 and OPUS-100 benchmark. Experimental and analytical results demonstrate that our model outperforms its Transformer counterparts by a consistent gain. Furthermore, it can be successfully scaled up to 60 encoder layers and 36 decoder layers.

artificial intelligence, natural language, translation, (15 more...)

doi: 10.1109/TASLP.2022.3221040

2207.14467

Country:

Asia > China > Beijing > Beijing (0.04)
Europe > Germany > Baden-Württemberg > Karlsruhe Region > Karlsruhe (0.04)

Genre: Research Report > New Finding (0.48)

Technology: Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)

Dou, Qingyun, Gales, Mark

Deliberation Networks and How to Train Them

Deliberation networks are a family of sequence-to-sequence models, which have achieved state-of-the-art performance in a wide range of tasks such as machine translation and speech synthesis. A deliberation network consists of multiple standard sequence-to-sequence models, each one conditioned on the initial input and the output of the previous model. During training, there are several key questions: whether to apply Monte Carlo approximation to the gradients or the loss, whether to train the standard models jointly or separately, whether to run an intermediate model in teacher forcing or free running mode, whether to apply task-specific techniques. Previous work on deliberation networks typically explores one or two training options for a specific task. This work introduces a unifying framework, covering various training options, and addresses the above questions. In general, it is simpler to approximate the gradients. When parallel training is essential, separate training should be adopted. Regardless of the task, the intermediate model should be in free running mode. For tasks where the output is continuous, a guided attention loss can be used to prevent degradation into a standard model.

artificial intelligence, machine learning, natural language, (17 more...)

2211.03217

Country: Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.50)
Information Technology > Artificial Intelligence > Speech > Speech Synthesis (0.34)

Cumbicus-Pineda, Oscar M, Gutiérrez-Fandiño, Iker, Gonzalez-Dios, Itziar, Soroa, Aitor

Noisy Channel for Automatic Text Simplification

In this paper we present a simple re-ranking method for Automatic Sentence Simplification based on the noisy channel scheme. Instead of directly computing the best simplification given a complex text, the re-ranking method also considers the probability of the simple sentence to produce the complex counterpart, as well as the probability of the simple text itself, according to a language model. Our experiments show that combining these scores outperform the original system in three different English datasets, yielding the best known result in one of them. Adopting the noisy channel scheme opens new ways to infuse additional information into ATS systems, and thus to control important aspects of them, a known limitation of end-to-end neural seq2seq generative models.

artificial intelligence, natural language, simplification, (14 more...)

2211.03152

Country:

Africa > Middle East > Egypt (0.05)
North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
Europe > United Kingdom > Scotland > City of Aberdeen > Aberdeen (0.04)
(8 more...)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Generation (0.48)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.47)

Dou, Qingyun, Gales, Mark

Parallel Attention Forcing for Machine Translation

Attention-based autoregressive models have achieved state-of-the-art performance in various sequence-to-sequence tasks, including Text-To-Speech (TTS) and Neural Machine Translation (NMT), but can be difficult to train. The standard training approach, teacher forcing, guides a model with the reference back-history. During inference, the generated back-history must be used. This mismatch limits the evaluation performance. Attention forcing has been introduced to address the mismatch, guiding the model with the generated back-history and reference attention. While successful in tasks with continuous outputs like TTS, attention forcing faces additional challenges in tasks with discrete outputs like NMT. This paper introduces the two extensions of attention forcing to tackle these challenges. (1) Scheduled attention forcing automatically turns attention forcing on and off, which is essential for tasks with discrete outputs. (2) Parallel attention forcing makes training parallel, and is applicable to Transformer-based models. The experiments show that the proposed approaches improve the performance of models based on RNNs and Transformers.

machine learning, natural language, translation, (18 more...)

2211.03237

Country: Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report (0.50)

Industry:

Education (0.46)
Information Technology (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.89)

Journal of Artificial Intelligence ResearchNov-6-2022

AAN+: Generalized Average Attention Network for Accelerating Neural Transformer

Zhang, Biao (a:1:{s:5:"en_US";s:23:"University of Edinburgh";}) | Xiong, Deyi | Ge, Yubin | Yao, Junfeng | Yue, Hao | Su, Jinsong

Transformer benefits from the high parallelization of attention networks in fast training, but it still suffers from slow decoding partially due to the linear dependency O(m) of the decoder self-attention on previous target words at inference. In this paper, we propose a generalized average attention network (AAN+) aiming at speeding up decoding by reducing the dependency from O(m) to O(1). We find that the learned self-attention weights in the decoder follow some patterns which can be approximated via a dynamic structure. Based on this insight, we develop AAN+, extending our previously proposed average attention (Zhang et al., 2018a, AAN) to support more general position- and content-based attention patterns. AAN+ only requires to maintain a small constant number of hidden states during decoding, ensuring its O(1) dependency. We apply AAN+ as a drop-in replacement of the decoder selfattention and conduct experiments on machine translation (with diverse language pairs), table-to-text generation and document summarization. With masking tricks and dynamic programming, AAN+ enables Transformer to decode sentences around 20% faster without largely compromising in the training speed and the generation performance. Our results further reveal the importance of the localness (neighboring words) in AAN+ and its capability in modeling long-range dependency.

dependency, transformer, translation, (14 more...)

Journal of Artificial Intelligence Research

doi: 10.1613/jair.1.13896

AI Access Foundation

13896

Journal of Artificial Intelligence Research

Country:

Europe > United Kingdom (0.14)
Asia > China > Fujian Province > Xiamen (0.05)
Europe > Belgium > Brussels-Capital Region > Brussels (0.04)
(16 more...)

Genre: Research Report > New Finding (0.88)

Industry:

Leisure & Entertainment (0.67)
Media > Television (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

#artificialintelligenceNov-5-2022, 12:25:30 GMT

Top Language Translation AI To Watch in 2022

When it comes to languages, many problems arise in typical translation services. Either it is bad grammar or the translation does not completely make sense afterward. It is essential that these mistakes do not fall through during the final translation, whether it's during a business transaction or simply a conversation. Luckily, technology has advanced this process with the help of automation and artificial intelligence, assisting with speed and accuracy. In this article, we will discuss some of the most prominent and up-and-coming companies that provide these automated solutions that break down the language barrier.

top language translation ai, translation, translation service, (13 more...)

#artificialintelligence

Industry: Law (0.36)

Technology: Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)

arXiv.org Artificial IntelligenceNov-4-2022

Non-Parametric Domain Adaptation for End-to-End Speech Translation

Du, Yichao, Wang, Weizhi, Zhang, Zhirui, Chen, Boxing, Xu, Tong, Xie, Jun, Chen, Enhong

End-to-End Speech Translation (E2E-ST) has received increasing attention due to the potential of its less error propagation, lower latency, and fewer parameters. However, the effectiveness of neural-based approaches to this task is severely limited by the available training corpus, especially for domain adaptation where in-domain triplet training data is scarce or nonexistent. In this paper, we propose a novel non-parametric method that leverages domain-specific text translation corpus to achieve domain adaptation for the E2E-ST system. To this end, we first incorporate an additional encoder into the pre-trained E2E-ST model to realize text translation modelling, and then unify the decoder's output representation for text and speech translation tasks by reducing the correspondent representation mismatch in available triplet training data. During domain adaptation, a k-nearest-neighbor (kNN) classifier is introduced to produce the final translation distribution using the external datastore built by the domain-specific text translation corpus, while the universal output representation is adopted to perform a similarity search. Experiments on the Europarl-ST benchmark demonstrate that when in-domain text translation data is involved only, our proposed approach significantly improves baseline by 12.82 BLEU on average in all translation directions, even outperforming the strong in-domain fine-tuning method.

machine learning, natural language, translation, (17 more...)

2205.11211

Country:

Asia > China (0.04)
North America > United States > California > Santa Barbara County > Santa Barbara (0.04)

Genre:

Research Report > New Finding (0.68)
Research Report > Experimental Study (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Nearest Neighbor Methods (0.54)

Ferrando, Javier, Gállego, Gerard I., Alastruey, Belen, Escolano, Carlos, Costa-jussà, Marta R.

Towards Opening the Black Box of Neural Machine Translation: Source and Target Interpretations of the Transformer

arXiv.org Artificial IntelligenceNov-4-2022

In Neural Machine Translation (NMT), each token prediction is conditioned on the source sentence and the target prefix (what has been previously translated at a decoding step). However, previous work on interpretability in NMT has mainly focused solely on source sentence tokens' attributions. Therefore, we lack a full understanding of the influences of every input token (source sentence and target prefix) in the model predictions. In this work, we propose an interpretability method that tracks input tokens' attributions for both contexts. Our method, which can be extended to any encoder-decoder Transformer-based model, allows us to better comprehend the inner workings of current NMT models. We apply the proposed method to both bilingual and multilingual Transformers and present insights into their behaviour.

artificial intelligence, contribution, natural language, (14 more...)

2205.11631

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
North America > Dominican Republic (0.04)
Europe > Italy > Tuscany > Florence (0.04)
(5 more...)

Genre: Research Report (0.82)

Industry: Transportation > Air (0.40)

Technology: Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)