AITopics

2211.10271

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
Europe > Austria > Vienna (0.14)
North America > Canada > Quebec > Montreal (0.04)
(9 more...)

Genre: Research Report > New Finding (0.68)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)

arXiv.org Artificial IntelligenceNov-18-2022

Learning an Artificial Language for Knowledge-Sharing in Multilingual Translation

Liu, Danni, Niehues, Jan

The cornerstone of multilingual neural translation is shared representations across languages. Given the theoretically infinite representation power of neural networks, semantically identical sentences are likely represented differently. While representing sentences in the continuous latent space ensures expressiveness, it introduces the risk of capturing of irrelevant features which hinders the learning of a common representation. In this work, we discretize the encoder output latent space of multilingual models by assigning encoder states to entries in a codebook, which in effect represents source sentences in a new artificial language. This discretization process not only offers a new way to interpret the otherwise black-box model representations, but, more importantly, gives potential for increasing robustness in unseen testing conditions. We validate our approach on large-scale experiments with realistic data volumes and domains. When tested in zero-shot conditions, our approach is competitive with two strong alternatives from the literature. We also use the learned artificial language to analyze model behavior, and discover that using a similar bridge language increases knowledge-sharing among the remaining languages.

computational linguistic, machine learning, natural language, (17 more...)

2211.01292

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
Asia > Thailand > Bangkok > Bangkok (0.04)
North America > Dominican Republic (0.04)
(25 more...)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Kim, Young Jin, Henry, Rawn, Fahim, Raffy, Awadalla, Hany Hassan

Who Says Elephants Can't Run: Bringing Large Scale MoE Models into Cloud Scale Production

arXiv.org Artificial IntelligenceNov-17-2022

Mixture of Experts (MoE) models with conditional execution of sparsely activated layers have enabled training models with a much larger number of parameters. As a result, these models have achieved significantly better quality on various natural language processing tasks including machine translation. However, it remains challenging to deploy such models in real-life scenarios due to the large memory requirements and inefficient inference. In this work, we introduce a highly efficient inference framework with several optimization approaches to accelerate the computation of sparse models and cut down the memory consumption significantly. While we achieve up to 26x speed-up in terms of throughput, we also reduce the model size almost to one eighth of the original 32-bit float model by quantizing expert weights into 4-bit integers. As a result, we are able to deploy 136x larger models with 27% less cost and significantly better quality compared to the existing solutions. This enables a paradigm shift in deploying large scale multilingual MoE transformers models replacing the traditional practice of distilling teacher models into dozens of smaller models per language or task.

artificial intelligence, computation, natural language, (14 more...)

2211.10017

Country: Europe > Italy > Calabria > Catanzaro Province > Catanzaro (0.04)

Genre: Research Report (0.50)

Technology: Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)

arXiv.org Artificial IntelligenceNov-17-2022

ConNER: Consistency Training for Cross-lingual Named Entity Recognition

Zhou, Ran, Li, Xin, Bing, Lidong, Cambria, Erik, Si, Luo, Miao, Chunyan

Cross-lingual named entity recognition (NER) suffers from data scarcity in the target languages, especially under zero-shot settings. Existing translate-train or knowledge distillation methods attempt to bridge the language gap, but often introduce a high level of noise. To solve this problem, consistency training methods regularize the model to be robust towards perturbations on data or hidden states. However, such methods are likely to violate the consistency hypothesis, or mainly focus on coarse-grain consistency. We propose ConNER as a novel consistency training framework for cross-lingual NER, which comprises of: (1) translation-based consistency training on unlabeled target-language data, and (2) dropoutbased consistency training on labeled source-language data. ConNER effectively leverages unlabeled target-language data and alleviates overfitting on the source language to enhance the cross-lingual adaptability. Experimental results show our ConNER achieves consistent improvement over various baseline methods.

artificial intelligence, natural language, text processing, (16 more...)

2211.09394

Country: Asia > Singapore (0.04)

Genre: Research Report > New Finding (0.34)

Industry: Education (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.94)

Does Simultaneous Speech Translation need Simultaneous Models?

Papi, Sara, Gaido, Marco, Negri, Matteo, Turchi, Marco

In simultaneous speech translation (SimulST), finding the best trade-off between high translation quality and low latency is a challenging task. To meet the latency constraints posed by the different application scenarios, multiple dedicated SimulST models are usually trained and maintained, generating high computational costs. In this paper, motivated by the increased social and environmental impact caused by these costs, we investigate whether a single model trained offline can serve not only the offline but also the simultaneous task without the need for any additional training or adaptation. Experiments on en->{de, es} indicate that, aside from facilitating the adoption of well-established offline techniques and architectures without affecting latency, the offline solution achieves similar or better translation quality compared to the same model trained in simultaneous settings, as well as being competitive with the SimulST state of the art.

machine translation, natural language, simultaneous speech translation, (2 more...)

doi: 10.18653/v1/2022.findings-emnlp.11

2204.03783

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)

Khan, Abdul Rafae, Kanade, Hrishikesh, Budhrani, Girish Amar, Jhanglani, Preet, Xu, Jia

SIT at MixMT 2022: Fluent Translation Built on Giant Pre-trained Models

This paper describes the Stevens Institute of Technology's submission for the WMT 2022 Shared Task: Code-mixed Machine Translation (MixMT). The task consisted of two subtasks, subtask $1$ Hindi/English to Hinglish and subtask $2$ Hinglish to English translation. Our findings lie in the improvements made through the use of large pre-trained multilingual NMT models and in-domain datasets, as well as back-translation and ensemble techniques. The translation output is automatically evaluated against the reference translations using ROUGE-L and WER. Our system achieves the $1^{st}$ position on subtask $2$ according to ROUGE-L, WER, and human evaluation, $1^{st}$ position on subtask $1$ according to WER and human evaluation, and $3^{rd}$ position on subtask $1$ with respect to ROUGE-L metric.

machine learning, natural language, translation, (19 more...)

2210.1167

Country:

North America > United States > Texas > Dallas County > Dallas (0.04)
North America > Dominican Republic (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
(2 more...)

Genre: Research Report (0.70)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Ciolino, Matthew, Noever, David, Kalin, Josh

Back Translation Survey for Improving Text Augmentation

Natural Language Processing (NLP) relies heavily on training data. Transformers, as they have gotten bigger, have required massive amounts of training data. To satisfy this requirement, text augmentation should be looked at as a way to expand your current dataset and to generalize your models. One text augmentation we will look at is translation augmentation. We take an English sentence and translate it to another language before translating it back to English. In this paper, we look at the effect of 108 different language back translations on various metrics and text embeddings.

artificial intelligence, machine learning, natural language, (17 more...)

2102.09708

Country:

Asia > Myanmar (0.05)
South America > Brazil (0.04)
North America > United States > California > Santa Clara County > Palo Alto (0.04)
(3 more...)

Genre: Research Report (0.40)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Bertram, Vincent, Boß, Miriam, Kusmenko, Evgeny, Nachmann, Imke Helene, Rumpe, Bernhard, Trotta, Danilo, Wachtmeister, Louis

Technical Report on Neural Language Models and Few-Shot Learning for Systematic Requirements Processing in MDSE

Systems engineering, in particular in the automotive domain, needs to cope with the massively increasing numbers of requirements that arise during the development process. To guarantee a high product quality and make sure that functional safety standards such as ISO26262 are fulfilled, the exploitation of potentials of model-driven systems engineering in the form of automatic analyses, consistency checks, and tracing mechanisms is indispensable. However, the language in which requirements are written, and the tools needed to operate on them, are highly individual and require domain-specific tailoring. This hinders automated processing of requirements as well as the linking of requirements to models. Introducing formal requirement notations in existing projects leads to the challenge of translating masses of requirements and process changes on the one hand and to the necessity of the corresponding training for the requirements engineers. In this paper, based on the analysis of an open-source set of automotive requirements, we derive domain-specific language constructs helping us to avoid ambiguities in requirements and increase the level of formality. The main contribution is the adoption and evaluation of few-shot learning with large pretrained language models for the automated translation of informal requirements to structured languages such as a requirement DSL. We show that support sets of less than ten translation examples can suffice to few-shot train a language model to incorporate keywords and implement syntactic rules into informal natural language requirements.

large language model, logic & formal reasoning, machine learning, (21 more...)

2211.09084

Country:

North America > Canada (0.04)
Europe > Germany > Bavaria > Upper Bavaria > Munich (0.04)
Europe > France (0.04)
(6 more...)

Genre: Research Report (0.64)

Industry: Automobiles & Trucks > Manufacturer (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.93)
(2 more...)

Quark: Controllable Text Generation with Reinforced Unlearning

Lu, Ximing, Welleck, Sean, Hessel, Jack, Jiang, Liwei, Qin, Lianhui, West, Peter, Ammanabrolu, Prithviraj, Choi, Yejin

Large-scale language models often learn behaviors that are misaligned with user expectations. Generated text may contain offensive or toxic language, contain significant repetition, or be of a different sentiment than desired by the user. We consider the task of unlearning these misalignments by fine-tuning the language model on signals of what not to do. We introduce Quantized Reward Konditioning (Quark), an algorithm for optimizing a reward function that quantifies an (un)wanted property, while not straying too far from the original model. Quark alternates between (i) collecting samples with the current language model, (ii) sorting them into quantiles based on reward, with each quantile identified by a reward token prepended to the language model's input, and (iii) using a standard language modeling loss on samples from each quantile conditioned on its reward token, while remaining nearby the original language model via a KL-divergence penalty. By conditioning on a high-reward token at generation time, the model generates text that exhibits less of the unwanted property. For unlearning toxicity, negative sentiment, and repetition, our experiments show that Quark outperforms both strong baselines and state-of-the-art reinforcement learning methods like PPO (Schulman et al. 2017), while relying only on standard language modeling primitives.

computational linguistic, machine learning, reinforcement learning, (21 more...)

2205.13636

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
Oceania > Australia > Victoria > Melbourne (0.04)
North America > United States > Texas > Travis County > Austin (0.04)
(19 more...)

Genre: Research Report (1.00)

Industry:

Government > Military (1.00)
Law > Statutes (0.93)
Government > Regional Government > North America Government > United States Government (0.93)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)
(2 more...)

TSMind: Alibaba and Soochow University's Submission to the WMT22 Translation Suggestion Task

Ge, Xin, Wang, Ke, Wang, Jiayi, Xiao, Nini, Duan, Xiangyu, Zhao, Yu, Zhang, Yuqi

This paper describes the joint submission of Alibaba and Soochow University, TSMind, to the WMT 2022 Shared Task on Translation Suggestion (TS). We participate in the English-German and English-Chinese tasks. Basically, we utilize the model paradigm fine-tuning on the downstream tasks based on large-scale pre-trained models, which has recently achieved great success. We choose FAIR's WMT19 English-German news translation system and MBART50 for English-Chinese as our pre-trained models. Considering the task's condition of limited use of training data, we follow the data augmentation strategies proposed by WeTS to boost our TS model performance. The difference is that we further involve the dual conditional cross-entropy model and GPT-2 language model to filter augmented data. The leader board finally shows that our submissions are ranked first in three of four language directions in the Naive TS task of the WMT22 Translation Suggestion task.

artificial intelligence, machine learning, natural language, (12 more...)

2211.08987

Country:

Europe > Germany > Saxony > Leipzig (0.04)
North America > United States > Texas > Travis County > Austin (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
(6 more...)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)