Machine Translation
Considerations for meaningful sign language machine translation based on glosses
Müller, Mathias, Jiang, Zifan, Moryossef, Amit, Rios, Annette, Ebling, Sarah
Automatic sign language processing is gaining popularity in Natural Language Processing (NLP) research (Yin et al., 2021). In machine translation (MT) in particular, sign language translation based on glosses is a prominent approach. In this paper, we review recent works on neural gloss translation. We find that limitations of glosses in general and limitations of specific datasets are not discussed in a transparent manner and that there is no common standard for evaluation. To address these issues, we put forward concrete recommendations for future research on gloss translation. Our suggestions advocate awareness of the inherent limitations of gloss-based approaches, realistic datasets, stronger baselines and convincing evaluation.
Beyond Counting Datasets: A Survey of Multilingual Dataset Construction and Necessary Resources
Yu, Xinyan Velocity, Asai, Akari, Chatterjee, Trina, Hu, Junjie, Choi, Eunsol
While the NLP community is generally aware of resource disparities among languages, we lack research that quantifies the extent and types of such disparity. Prior surveys estimating the availability of resources based on the number of datasets can be misleading as dataset quality varies: many datasets are automatically induced or translated from English data. To provide a more comprehensive picture of language resources, we examine the characteristics of 156 publicly available NLP datasets. We manually annotate how they are created, including input text and label sources and tools used to build them, and what they study, tasks they address and motivations for their creation. After quantifying the qualitative NLP resource gap across languages, we discuss how to improve data collection in low-resource languages. We survey language-proficient NLP researchers and crowd workers per language, finding that their estimated availability correlates with dataset availability. Through crowdsourcing experiments, we identify strategies for collecting high-quality multilingual data on the Mechanical Turk platform. We conclude by making macro and micro-level suggestions to the NLP community and individual researchers for future multilingual data development.
Distance Metric Learning Loss Functions in Few-Shot Scenarios of Supervised Language Models Fine-Tuning
Sosnowski, Witold, Seweryn, Karolina, Wróblewska, Anna, Gawrysiak, Piotr
This paper presents an analysis regarding an influence of the Distance Metric Learning (DML) loss functions on the supervised fine-tuning of the language models for classification tasks. We experimented with known datasets from SentEval Transfer Tasks. Our experiments show that applying the DML loss function can increase performance on downstream classification tasks of RoBERTa-large models in few-shot scenarios. Models fine-tuned with the use of SoftTriple loss can achieve better results than models with a standard categorical cross-entropy loss function by about 2.89 percentage points from 0.04 to 13.48 percentage points depending on the training dataset. Additionally, we accomplished a comprehensive analysis with explainability techniques to assess the models' reliability and explain their results.
Deep representation learning: Fundamentals, Perspectives, Applications, and Open Challenges
Baghaei, Kourosh T., Payandeh, Amirreza, Fayyazsanavi, Pooya, Rahimi, Shahram, Chen, Zhiqian, Ramezani, Somayeh Bakhtiari
Machine Learning algorithms have had a profound impact on the field of computer science over the past few decades. These algorithms performance is greatly influenced by the representations that are derived from the data in the learning process. The representations learned in a successful learning process should be concise, discrete, meaningful, and able to be applied across a variety of tasks. A recent effort has been directed toward developing Deep Learning models, which have proven to be particularly effective at capturing high-dimensional, non-linear, and multi-modal characteristics. In this work, we discuss the principles and developments that have been made in the process of learning representations, and converting them into desirable applications. In addition, for each framework or model, the key issues and open challenges, as well as the advantages, are examined.
EPIK: Eliminating multi-model Pipelines with Knowledge-distillation
Laddagiri, Bhavesh, Raj, Yash, Dash, Anshuman
Real-world tasks are largely composed of multiple models, each performing a sub-task in a larger chain of tasks, i.e., using the output from a model as input for another model in a multi-model pipeline. A model like MATRa performs the task of Crosslingual Transliteration in two stages, using English as an intermediate transliteration target when transliterating between two indic languages. We propose a novel distillation technique, EPIK, that condenses two-stage pipelines for hierarchical tasks into a single end-to-end model without compromising performance. This method can create end-to-end models for tasks without needing a dedicated end-to-end dataset, solving the data scarcity problem. The EPIK model has been distilled from the MATra model using this technique of knowledge distillation. The MATra model can perform crosslingual transliteration between 5 languages - English, Hindi, Tamil, Kannada and Bengali. The EPIK model executes the task of transliteration without any intermediate English output while retaining the performance and accuracy of the MATra model. The EPIK model can perform transliteration with an average CER score of 0.015 and average phonetic accuracy of 92.1%. In addition, the average time for execution has reduced by 54.3% as compared to the teacher model and has a similarity score of 97.5% with the teacher encoder. In a few cases, the EPIK model (student model) can outperform the MATra model (teacher model) even though it has been distilled from the MATra model.
BJTU-WeChat's Systems for the WMT22 Chat Translation Task
Liang, Yunlong, Meng, Fandong, Xu, Jinan, Chen, Yufeng, Zhou, Jie
This paper introduces the joint submission of the Beijing Jiaotong University and WeChat AI to the WMT'22 chat translation task for English-German. Based on the Transformer, we apply several effective variants. In our experiments, we utilize the pre-training-then-fine-tuning paradigm. In the first pre-training stage, we employ data filtering and synthetic data generation (i.e., back-translation, forward-translation, and knowledge distillation). In the second fine-tuning stage, we investigate speaker-aware in-domain data generation, speaker adaptation, prompt-based context modeling, target denoising fine-tuning, and boosted self-COMET-based model ensemble. Our systems achieve 0.810 and 0.946 COMET scores. The COMET scores of English-German and German-English are the highest among all submissions.
Summer: WeChat Neural Machine Translation Systems for the WMT22 Biomedical Translation Task
Li, Ernan, Meng, Fandong, Zhou, Jie
This paper introduces WeChat's participation in WMT 2022 shared biomedical translation task on Chinese to English. Our systems are based on the Transformer, and use several different Transformer structures to improve the quality of translation. In our experiments, we employ data filtering, data generation, several variants of Transformer, fine-tuning and model ensemble. Our Chinese$\to$English system, named Summer, achieves the highest BLEU score among all submissions.
u-HuBERT: Unified Mixed-Modal Speech Pretraining And Zero-Shot Transfer to Unlabeled Modality
While audio-visual speech models can yield superior performance and robustness compared to audio-only models, their development and adoption are hindered by the lack of labeled and unlabeled audio-visual data and the cost to deploy one model per modality. In this paper, we present u-HuBERT, a self-supervised pre-training framework that can leverage both multimodal and unimodal speech with a unified masked cluster prediction objective. By utilizing modality dropout during pre-training, we demonstrate that a single fine-tuned model can achieve performance on par or better than the state-of-the-art modality-specific models. Moreover, our model fine-tuned only on audio can perform well with audio-visual and visual speech input, achieving zero-shot modality generalization for multiple speech processing tasks.
Lexical Complexity Controlled Sentence Generation
Nie, Jinran, Yang, Liner, Chen, Yun, Kong, Cunliang, Zhu, Junhui, Yang, Erhong
Text generation rarely considers the control of lexical complexity, which limits its more comprehensive practical application. We introduce a novel task of lexical complexity controlled sentence generation, which aims at keywords to sentence generation with desired complexity levels. It has enormous potential in domains such as grade reading, language teaching and acquisition. The challenge of this task is to generate fluent sentences only using the words of given complexity levels. We propose a simple but effective approach for this task based on complexity embedding. Compared with potential solutions, our approach fuses the representations of the word complexity levels into the model to get better control of lexical complexity. And we demonstrate the feasibility of the approach for both training models from scratch and fine-tuning the pre-trained models. To facilitate the research, we develop two datasets in English and Chinese respectively, on which extensive experiments are conducted. Results show that our approach better controls lexical complexity and generates higher quality sentences than baseline methods.
Inferencing the Transformer Model - MachineLearningMastery.com Inferencing the Transformer Model - MachineLearningMastery.com
We have seen how to train the Transformer model on a dataset of English and German sentence pairs and how to plot the training and validation loss curves to diagnose the model's learning performance and decide at which epoch to run inference on the trained model. We are now ready to run inference on the trained Transformer model to translate an input sentence. In this tutorial, you will discover how to run inference on the trained Transformer model for neural machine translation. It provides self-study tutorials with working code to guide you into building a fully-working transformer model that can translate sentences from one language to another... Inferencing the Transformer model Photo by Karsten Würth, some rights reserved. Recall having seen that the Transformer architecture follows an encoder-decoder structure.