Goto

Collaborating Authors

 Machine Translation


Generating Synthetic Speech from SpokenVocab for Speech Translation

arXiv.org Artificial Intelligence

Training end-to-end speech translation (ST) systems requires sufficiently large-scale data, which is unavailable for most language pairs and domains. One practical solution to the data scarcity issue is to convert text-based machine translation (MT) data to ST data via textto-speech (TTS) systems.Yet, using TTS systems can be tedious and slow. In this work, we propose SpokenVocab, a simple, scalable and effective data augmentation technique to convert MT data to ST data on-the-fly. The idea is to retrieve and stitch audio snippets, corresponding to words in an MT sentence, from a spoken vocabulary bank. Our experiments on multiple language pairs show that stitched speech helps to improve translation quality by an average of 1.83 BLEU score, while performing equally well as TTS-generated speech in improving translation quality. We also Figure 1: Overview of generating synthetic speech showcase how SpokenVocab can be applied in from SpokenVocab on-the-fly. The first step is to prepare code-switching ST for which often no TTS the SpokenVocab bank offline and the second step systems exit.


Learning Translation Quality Evaluation on Low Resource Languages from Large Language Models

arXiv.org Artificial Intelligence

Learned metrics such as BLEURT have in recent years become widely employed to evaluate the quality of machine translation systems. Training such metrics requires data which can be expensive and difficult to acquire, particularly for lowerresource languages. We show how knowledge can be distilled from Large Language Models (LLMs) to improve upon such learned metrics without requiring human annotators, by creating synthetic datasets which can be mixed into existing datasets, requiring only a corpus of text in the target language. We show that the performance of a BLEURT-like model on lower resource languages can be improved in this way. A machine translation system is typically evaluated by comparing its output on a given input sentence with one made by a professional translator. Until recently, commonly used metrics such as BLEU (Papineni et al., 2002b) and ROGUE (Lin, 2004) were generally based on number of co-occurring n-grams. Advantages of such methods include that they are easy to interpret, do not require learning from data, and have been shown to generally correlate with human judgement when averaged over a corpus of sentences. Nonetheless, these approaches fail when sentences are semantically similar but differ significantly in phrasing.


The Monitor Model and its Misconceptions: A Clarification

arXiv.org Artificial Intelligence

Horizontal (automatic) and vertical (control) processes have been observed and reported for a long time in translation production. Schaeffer and Carl's Monitor Model integrates these two processes into one framework, assuming that priming mechanisms underlie horizontal/automatic processes, while vertical/monitoring processes implement consciously accessible control mechanisms. The Monitor Model has been criticized in various ways and several misconceptions have accumulated over the past years. In this chapter, I update the Monitor Model with additional evidence and argue that it is compatible with an enactivist approach to cognition. I address several misconceptions related to the Monitor Model.


World University Law School - World University and School Wiki

#artificialintelligence

Welcome to World University and School Wiki which anyone can add to or edit. WUaS would like to offer online CLE credits with these great universities, anticipating accrediting WUaS Law Schools in 204 countries. California, the state in which WUaS is incorporated, has 12 online law schools (none of these are ABA approved, but anyone can sit the California Bar exam, regardless of such approval, as I understand it), at present, and WUaS would like to develop another online MIT OCW/Harvard-centric law school, and eventually accredit in all 204 countries in the world, in main languages in those countries, beginning with the 6 United Nations' languages. Online Law Schools Have Yet to Pass the Bar: Many argue that fully online programs aren't the path to a traditional legal career]. WUaS is planning for a "Admitted Students' Day" for the first, matriculating Bachelor's degree class, on or around Saturday, April 14th, 2014, and the second Saturday of April for other degrees in the future.


MCoNaLa: A Benchmark for Code Generation from Multiple Natural Languages

arXiv.org Artificial Intelligence

While there has been a recent burgeoning of applications at the intersection of natural and programming languages, such as code generation and code summarization, these applications are usually English-centric. This creates a barrier for program developers who are not proficient in English. To mitigate this gap in technology development across languages, we propose a multilingual dataset, MCoNaLa, to benchmark code generation from natural language commands extending beyond English. Modeled off of the methodology from the English Code/Natural Language Challenge (CoNaLa) dataset, we annotated a total of 896 NL-code pairs in three languages: Spanish, Japanese, and Russian. We present a quantitative evaluation of performance on the MCoNaLa dataset by testing with state-of-the-art code generation systems. While the difficulties vary across these three languages, all systems lag significantly behind their English counterparts, revealing the challenges in adapting code generation to new languages.


DiscoScore: Evaluating Text Generation with BERT and Discourse Coherence

arXiv.org Artificial Intelligence

Recently, there has been a growing interest in designing text generation systems from a discourse coherence perspective, e.g., modeling the interdependence between sentences. Still, recent BERT-based evaluation metrics are weak in recognizing coherence, and thus are not reliable in a way to spot the discourse-level improvements of those text generation systems. In this work, we introduce DiscoScore, a parametrized discourse metric, which uses BERT to model discourse coherence from different perspectives, driven by Centering theory. Our experiments encompass 16 non-discourse and discourse metrics, including DiscoScore and popular coherence models, evaluated on summarization and document-level machine translation (MT). We find that (i) the majority of BERT-based metrics correlate much worse with human rated coherence than early discourse metrics, invented a decade ago; (ii) the recent state-of-the-art BARTScore is weak when operated at system level -- which is particularly problematic as systems are typically compared in this manner. DiscoScore, in contrast, achieves strong system-level correlation with human ratings, not only in coherence but also in factual consistency and other aspects, and surpasses BARTScore by over 10 correlation points on average. Further, aiming to understand DiscoScore, we provide justifications to the importance of discourse coherence for evaluation metrics, and explain the superiority of one variant over another. Our code is available at \url{https://github.com/AIPHES/DiscoScore}.


Exploring Data Augmentation for Code Generation Tasks

arXiv.org Artificial Intelligence

Advances in natural language processing, such as transfer learning from pre-trained language models, have impacted how models are trained for programming language tasks too. Previous research primarily explored code pre-training and expanded it through multi-modality and multi-tasking, yet the data for downstream tasks remain modest in size. Focusing on data utilization for downstream tasks, we propose and adapt augmentation methods that yield consistent improvements in code translation and summarization by up to 6.9% and 7.5% respectively. Further analysis suggests that our methods work orthogonally and show benefits in output code style and numeric consistency. We also discuss test data imperfections.


"Unlocking the Potential of Machine Translation Through Dataset Training, Validation, andโ€ฆ

#artificialintelligence

The coronavirus pandemic has changed the way we live, work, and interact with each other. We've all had to make adjustments to the way we do things, including the way we shop. We're now seeing a shift towards contactless and digital payments, which has made it easier for us to stay safe and healthy while still being able to purchase the items we need. Contactless payments have become increasingly popular during the pandemic and offer a range of benefits. Not only are they faster, more convenient, and more secure than traditional payment methods, but they also provide an extra layer of protection from the virus.


Advances in Automatically Rating the Trustworthiness of Text Processing Services

arXiv.org Artificial Intelligence

AI services are known to have unstable behavior when subjected to changes in data, models or users. Such behaviors, whether triggered by omission or commission, lead to trust issues when AI works with humans. The current approach of assessing AI services in a black box setting, where the consumer does not have access to the AI's source code or training data, is limited. The consumer has to rely on the AI developer's documentation and trust that the system has been built as stated. Further, if the AI consumer reuses the service to build other services which they sell to their customers, the consumer is at the risk of the service providers (both data and model providers). Our approach, in this context, is inspired by the success of nutritional labeling in food industry to promote health and seeks to assess and rate AI services for trust from the perspective of an independent stakeholder. The ratings become a means to communicate the behavior of AI systems so that the consumer is informed about the risks and can make an informed decision. In this paper, we will first describe recent progress in developing rating methods for text-based machine translator AI services that have been found promising with user studies. Then, we will outline challenges and vision for a principled, multi-modal, causality-based rating methodologies and its implication for decision-support in real-world scenarios like health and food recommendation.


CoNT: Contrastive Neural Text Generation

arXiv.org Artificial Intelligence

Recently, contrastive learning attracts increasing interests in neural text generation as a new solution to alleviate the exposure bias problem. It introduces a sequence-level training signal which is crucial to generation tasks that always rely on auto-regressive decoding. However, previous methods using contrastive learning in neural text generation usually lead to inferior performance. In this paper, we analyse the underlying reasons and propose a new Contrastive Neural Text generation framework, CoNT. CoNT addresses bottlenecks that prevent contrastive learning from being widely adopted in generation tasks from three aspects -- the construction of contrastive examples, the choice of the contrastive loss, and the strategy in decoding. We validate CoNT on five generation tasks with ten benchmarks, including machine translation, summarization, code comment generation, data-to-text generation and commonsense generation. Experimental results show that CoNT clearly outperforms the conventional training framework on all the ten benchmarks with a convincing margin. Especially, CoNT surpasses previous the most competitive contrastive learning method for text generation, by 1.50 BLEU on machine translation and 1.77 ROUGE-1 on summarization, respectively. It achieves new state-of-the-art on summarization, code comment generation (without external data) and data-to-text generation.