Machine Translation
A Closer Look At Translation Hub: Enterprise Translation Made Easy - Liwaiwai
In this article, we'll take a closer look at Translation Hub's powerful features, and the ways it is helping customers do more with their content. Translation Hub is a fully-managed, self-serve translation offering, powered by Google AI and built for the enterprise. With Translation Hub, businesses can instantaneously translate content into 135 languages with a single click, via an intuitive interface that integrates human reviews (i.e., a "human in the loop") where required. Organizations need to be able to share the output of AI-powered translation with localization teams or agencies, for review. They need to save time by leveraging glossaries or customer machine learning (ML) models.
Attention-Based Models for Speech Recognition
Recurrent sequence generators conditioned on input data through an attention mechanism have recently shown very good performance on a range of tasks including machine translation, handwriting synthesis [1, 2] and image caption generation [3]. We extend the attention-mechanism with features needed for speech recognition. We show that while an adaptation of the model used for machine translation in [2] reaches a competitive 18.7% phoneme error rate (PER) on the TIMIT phoneme recognition task, it can only be applied to utterances which are roughly as long as the ones it was trained on. We offer a qualitative explanation of this failure and propose a novel and generic method of adding location-awareness to the attention mechanism to alleviate this issue. The new method yields a model that is robust to long inputs and achieves 18% PER in single utterances and 20% in 10-times longer (repeated) utterances. Finally, we propose a change to the attention mechanism that prevents it from concentrating too much on single frames, which further reduces PER to 17.6% level.
Enhancing Speech-to-Speech Translation with Multiple TTS Targets
Shi, Jiatong, Tang, Yun, Lee, Ann, Inaguma, Hirofumi, Wang, Changhan, Pino, Juan, Watanabe, Shinji
It has been known that direct speech-to-speech translation (S2ST) models usually suffer from the data scarcity issue because of the limited existing parallel materials for both source and target speech. Therefore to train a direct S2ST system, previous works usually utilize text-to-speech (TTS) systems to generate samples in the target language by augmenting the data from speech-to-text translation (S2TT). However, there is a limited investigation into how the synthesized target speech would affect the S2ST models. In this work, we analyze the effect of changing synthesized target speech for direct S2ST models. We find that simply combining the target speech from different TTS systems can potentially improve the S2ST performances. Following that, we also propose a multi-task framework that jointly optimizes the S2ST system with multiple targets from different TTS systems. Extensive experiments demonstrate that our proposed framework achieves consistent improvements (2.8 BLEU) over the baselines on the Fisher Spanish-English dataset.
DISTO: Evaluating Textual Distractors for Multi-Choice Questions using Negative Sampling based Approach
Multiple choice questions (MCQs) are an efficient and common way to assess reading comprehension (RC). Every MCQ needs a set of distractor answers that are incorrect, but plausible enough to test student knowledge. Distractor generation (DG) models have been proposed, and their performance is typically evaluated using machine translation (MT) metrics. However, MT metrics often misjudge the suitability of generated distractors. We propose DISTO: the first learned evaluation metric for generated distractors. We validate DISTO by showing its scores correlate highly with human ratings of distractor quality. At the same time, DISTO ranks the performance of stateof-the-art Figure 1: A multi-choice question example from the DG models very differently from RACE dataset (Lai et al., 2017). The generated distractors MT-based metrics, showing that MT metrics were produced using a T5 model. Though the should not be used for distractor evaluation.
Bridging Graph Position Encodings for Transformers with Weighted Graph-Walking Automata
A current goal in the graph neural network literature is to enable transformers to operate on graph-structured data, given their success on language and vision tasks. Since the transformer's original sinusoidal position encodings (PEs) are not applicable to graphs, recent work has focused on developing graph PEs, rooted in spectral graph theory or various spatial features of a graph. In this work, we introduce a new graph PE, Graph Automaton PE (GAPE), based on weighted graph-walking automata (a novel extension of graph-walking automata). We compare the performance of GAPE with other PE schemes on both machine translation and graph-structured tasks, and we show that it generalizes and connects with several other PEs. An additional contribution of this study is a theoretical and controlled experimental comparison of many recent PEs in graph transformers, independent of the use of edge features.
Decoder-Only or Encoder-Decoder? Interpreting Language Model as a Regularized Encoder-Decoder
Fu, Zihao, Lam, Wai, Yu, Qian, So, Anthony Man-Cho, Hu, Shengding, Liu, Zhiyuan, Collier, Nigel
The sequence-to-sequence (seq2seq) task aims at generating the target sequence based on the given input source sequence. Traditionally, most of the seq2seq task is resolved by the Encoder-Decoder framework which requires an encoder to encode the source sequence and a decoder to generate the target text. Recently, a bunch of new approaches have emerged that apply decoder-only language models directly to the seq2seq task. Despite the significant advancements in applying language models to the seq2seq task, there is still a lack of thorough analysis on the effectiveness of the decoder-only language model architecture. This paper aims to address this gap by conducting a detailed comparison between the encoder-decoder architecture and the decoder-only language model framework through the analysis of a regularized encoder-decoder structure. This structure is designed to replicate all behaviors in the classical decoder-only language model but has an encoder and a decoder making it easier to be compared with the classical encoder-decoder structure. Based on the analysis, we unveil the attention degeneration problem in the language model, namely, as the generation step number grows, less and less attention is focused on the source sequence. To give a quantitative understanding of this problem, we conduct a theoretical sensitivity analysis of the attention output with respect to the source input. Grounded on our analysis, we propose a novel partial attention language model to solve the attention degeneration problem. Experimental results on machine translation, summarization, and data-to-text generation tasks support our analysis and demonstrate the effectiveness of our proposed model.
tmn at SemEval-2023 Task 9: Multilingual Tweet Intimacy Detection using XLM-T, Google Translate, and Ensemble Learning
The paper describes a transformer-based system designed for SemEval-2023 Task 9: Multilingual Tweet Intimacy Analysis. The purpose of the task was to predict the intimacy of tweets in a range from 1 (not intimate at all) to 5 (very intimate). The official training set for the competition consisted of tweets in six languages (English, Spanish, Italian, Portuguese, French, and Chinese). The test set included the given six languages as well as external data with four languages not presented in the training set (Hindi, Arabic, Dutch, and Korean). We presented a solution based on an ensemble of XLM-T, a multilingual RoBERTa model adapted to the Twitter domain. To improve the performance of unseen languages, each tweet was supplemented by its English translation. We explored the effectiveness of translated data for the languages seen in fine-tuning compared to unseen languages and estimated strategies for using translated data in transformer-based models. Our solution ranked 4th on the leaderboard while achieving an overall Pearson's r of 0.599 over the test set. The proposed system improves up to 0.088 Pearson's r over a score averaged across all 45 submissions.
Scotland - World University and School Wiki
Welcome to World University and School Wiki which anyone can add to or edit. See, too, the British Film Institute. If you were Scotland and heading for independence with a vote in the British Isles in 2014 or beyond, which currency would you choose for Scotland's long term prosperity, - institutional-wise, especially (e.g. "Like many Scots, I can clearly distinguish between independence and nationalism, and I certainly wouldn't be voting for nationalism, certainly not for tartan-la-la. Really I'd want a yes vote, then a bloodless coup the next morning, before there were any flags or triumphalism."
JANUS: Speech-to-Speech Translation Using Connectionist and Non-Connectionist Techniques
JANUS translates continuously spoken English and German into German, English, and Japanese. JANUS cur(cid:173) rently achieves 87% translation fidelity from English speech and 97% from German speech. We present the JANUS system along with com(cid:173) parative evaluations of its interchangeable processing components, with special emphasis on the connectionist modules.
Learning from Multiple Partially Observed Views - an Application to Multilingual Text Categorization
We address the problem of learning classifiers when observations have multiple views, some of which may not be observed for all examples. We assume the existence of view generating functions which may complete the missing views in an approximate way. This situation corresponds for example to learning text classifiers from multilingual collections where documents are not available in all languages. In that case, Machine Translation (MT) systems may be used to translate each document in the missing languages. We derive a generalization error bound for classifiers learned on examples with multiple artificially created views.