Goto

Collaborating Authors

 Machine Translation


Prompting Neural Machine Translation with Translation Memories

arXiv.org Artificial Intelligence

Improving machine translation (MT) systems with translation memories (TMs) is of great interest to practitioners in the MT community. However, previous approaches require either a significant update of the model architecture and/or additional training efforts to make the models well-behaved when TMs are taken as additional input. In this paper, we present a simple but effective method to introduce TMs into neural machine translation (NMT) systems. Specifically, we treat TMs as prompts to the NMT model at test time, but leave the training process unchanged. The result is a slight update of an existing NMT system, which can be implemented in a few hours by anyone who is familiar with NMT. Experimental results on several datasets demonstrate that our system significantly outperforms strong baselines.


N-Gram Nearest Neighbor Machine Translation

arXiv.org Artificial Intelligence

Nearest neighbor machine translation augments the Autoregressive Translation~(AT) with $k$-nearest-neighbor retrieval, by comparing the similarity between the token-level context representations of the target tokens in the query and the datastore. However, the token-level representation may introduce noise when translating ambiguous words, or fail to provide accurate retrieval results when the representation generated by the model contains indistinguishable context information, e.g., Non-Autoregressive Translation~(NAT) models. In this paper, we propose a novel $n$-gram nearest neighbor retrieval method that is model agnostic and applicable to both AT and NAT models. Specifically, we concatenate the adjacent $n$-gram hidden representations as the key, while the tuple of corresponding target tokens is the value. In inference, we propose tailored decoding algorithms for AT and NAT models respectively. We demonstrate that the proposed method consistently outperforms the token-level method on both AT and NAT models as well on general as on domain adaptation translation tasks. On domain adaptation, the proposed method brings $1.03$ and $2.76$ improvements regarding the average BLEU score on AT and NAT models respectively.


Scaling Back-Translation with Domain Text Generation for Sign Language Gloss Translation

arXiv.org Artificial Intelligence

Sign language gloss translation aims to translate the sign glosses into spoken language texts, which is challenging due to the scarcity of labeled gloss-text parallel data. Back translation (BT), which generates pseudo-parallel data by translating in-domain spoken language texts into sign glosses, has been applied to alleviate the data scarcity problem. However, the lack of large-scale high-quality domain spoken language text data limits the effect of BT. In this paper, to overcome the limitation, we propose a Prompt based domain text Generation (PGEN) approach to produce the large-scale in-domain spoken language text data. Specifically, PGEN randomly concatenates sentences from the original in-domain spoken language text data as prompts to induce a pre-trained language model (i.e., GPT-2) to generate spoken language texts in a similar style. Experimental results on three benchmarks of sign language gloss translation in varied languages demonstrate that BT with spoken language texts generated by PGEN significantly outperforms the compared methods. In addition, as the scale of spoken language texts generated by PGEN increases, the BT technique can achieve further improvements, demonstrating the effectiveness of our approach. We release the code and data for facilitating future research in this field.


Efficiently Upgrading Multilingual Machine Translation Models to Support More Languages

arXiv.org Artificial Intelligence

With multilingual machine translation (MMT) models continuing to grow in size and number of supported languages, it is natural to reuse and upgrade existing models to save computation as data becomes available in more languages. However, adding new languages requires updating the vocabulary, which complicates the reuse of embeddings. The question of how to reuse existing models while also making architectural changes to provide capacity for both old and new languages has also not been closely studied. In this work, we introduce three techniques that help speed up effective learning of the new languages and alleviate catastrophic forgetting despite vocabulary and architecture mismatches. Our results show that by (1) carefully initializing the network, (2) applying learning rate scaling, and (3) performing data up-sampling, it is possible to exceed the performance of a same-sized baseline model with 30% computation and recover the performance of a larger model trained from scratch with over 50% reduction in computation. Furthermore, our analysis reveals that the introduced techniques help learn the new directions more effectively and alleviate catastrophic forgetting at the same time. We hope our work will guide research into more efficient approaches to growing languages for these MMT models and ultimately maximize the reuse of existing models.


Generating Synthetic Speech from SpokenVocab for Speech Translation

arXiv.org Artificial Intelligence

Training end-to-end speech translation (ST) systems requires sufficiently large-scale data, which is unavailable for most language pairs and domains. One practical solution to the data scarcity issue is to convert text-based machine translation (MT) data to ST data via textto-speech (TTS) systems.Yet, using TTS systems can be tedious and slow. In this work, we propose SpokenVocab, a simple, scalable and effective data augmentation technique to convert MT data to ST data on-the-fly. The idea is to retrieve and stitch audio snippets, corresponding to words in an MT sentence, from a spoken vocabulary bank. Our experiments on multiple language pairs show that stitched speech helps to improve translation quality by an average of 1.83 BLEU score, while performing equally well as TTS-generated speech in improving translation quality. We also Figure 1: Overview of generating synthetic speech showcase how SpokenVocab can be applied in from SpokenVocab on-the-fly. The first step is to prepare code-switching ST for which often no TTS the SpokenVocab bank offline and the second step systems exit.


Learning Translation Quality Evaluation on Low Resource Languages from Large Language Models

arXiv.org Artificial Intelligence

Learned metrics such as BLEURT have in recent years become widely employed to evaluate the quality of machine translation systems. Training such metrics requires data which can be expensive and difficult to acquire, particularly for lowerresource languages. We show how knowledge can be distilled from Large Language Models (LLMs) to improve upon such learned metrics without requiring human annotators, by creating synthetic datasets which can be mixed into existing datasets, requiring only a corpus of text in the target language. We show that the performance of a BLEURT-like model on lower resource languages can be improved in this way. A machine translation system is typically evaluated by comparing its output on a given input sentence with one made by a professional translator. Until recently, commonly used metrics such as BLEU (Papineni et al., 2002b) and ROGUE (Lin, 2004) were generally based on number of co-occurring n-grams. Advantages of such methods include that they are easy to interpret, do not require learning from data, and have been shown to generally correlate with human judgement when averaged over a corpus of sentences. Nonetheless, these approaches fail when sentences are semantically similar but differ significantly in phrasing.


The Monitor Model and its Misconceptions: A Clarification

arXiv.org Artificial Intelligence

Horizontal (automatic) and vertical (control) processes have been observed and reported for a long time in translation production. Schaeffer and Carl's Monitor Model integrates these two processes into one framework, assuming that priming mechanisms underlie horizontal/automatic processes, while vertical/monitoring processes implement consciously accessible control mechanisms. The Monitor Model has been criticized in various ways and several misconceptions have accumulated over the past years. In this chapter, I update the Monitor Model with additional evidence and argue that it is compatible with an enactivist approach to cognition. I address several misconceptions related to the Monitor Model.


World University Law School - World University and School Wiki

#artificialintelligence

Welcome to World University and School Wiki which anyone can add to or edit. WUaS would like to offer online CLE credits with these great universities, anticipating accrediting WUaS Law Schools in 204 countries. California, the state in which WUaS is incorporated, has 12 online law schools (none of these are ABA approved, but anyone can sit the California Bar exam, regardless of such approval, as I understand it), at present, and WUaS would like to develop another online MIT OCW/Harvard-centric law school, and eventually accredit in all 204 countries in the world, in main languages in those countries, beginning with the 6 United Nations' languages. Online Law Schools Have Yet to Pass the Bar: Many argue that fully online programs aren't the path to a traditional legal career]. WUaS is planning for a "Admitted Students' Day" for the first, matriculating Bachelor's degree class, on or around Saturday, April 14th, 2014, and the second Saturday of April for other degrees in the future.


MCoNaLa: A Benchmark for Code Generation from Multiple Natural Languages

arXiv.org Artificial Intelligence

While there has been a recent burgeoning of applications at the intersection of natural and programming languages, such as code generation and code summarization, these applications are usually English-centric. This creates a barrier for program developers who are not proficient in English. To mitigate this gap in technology development across languages, we propose a multilingual dataset, MCoNaLa, to benchmark code generation from natural language commands extending beyond English. Modeled off of the methodology from the English Code/Natural Language Challenge (CoNaLa) dataset, we annotated a total of 896 NL-code pairs in three languages: Spanish, Japanese, and Russian. We present a quantitative evaluation of performance on the MCoNaLa dataset by testing with state-of-the-art code generation systems. While the difficulties vary across these three languages, all systems lag significantly behind their English counterparts, revealing the challenges in adapting code generation to new languages.


DiscoScore: Evaluating Text Generation with BERT and Discourse Coherence

arXiv.org Artificial Intelligence

Recently, there has been a growing interest in designing text generation systems from a discourse coherence perspective, e.g., modeling the interdependence between sentences. Still, recent BERT-based evaluation metrics are weak in recognizing coherence, and thus are not reliable in a way to spot the discourse-level improvements of those text generation systems. In this work, we introduce DiscoScore, a parametrized discourse metric, which uses BERT to model discourse coherence from different perspectives, driven by Centering theory. Our experiments encompass 16 non-discourse and discourse metrics, including DiscoScore and popular coherence models, evaluated on summarization and document-level machine translation (MT). We find that (i) the majority of BERT-based metrics correlate much worse with human rated coherence than early discourse metrics, invented a decade ago; (ii) the recent state-of-the-art BARTScore is weak when operated at system level -- which is particularly problematic as systems are typically compared in this manner. DiscoScore, in contrast, achieves strong system-level correlation with human ratings, not only in coherence but also in factual consistency and other aspects, and surpasses BARTScore by over 10 correlation points on average. Further, aiming to understand DiscoScore, we provide justifications to the importance of discourse coherence for evaluation metrics, and explain the superiority of one variant over another. Our code is available at \url{https://github.com/AIPHES/DiscoScore}.