Machine Translation
Applying Machine Learning to Everyday Life
The use of navigation, online purchases, social media browsing, or streaming services is all impacted by machine learning in one way or another. FREMONT, CA: A new wave of attention is being paid to machine learning, a subset of artificial intelligence. A resurgence in interest in big data is attributed to many factors, including powerful and affordable computational processing, increasing volumes of big data sets, and affordable data storage options. Machine learning is teaching machines to recognize patterns in data and apply them to specific problems. Whenever new data is presented to machine learning models, they adapt independently to make sense of it.
Baechi: Fast Device Placement of Machine Learning Graphs
Jeon, Beomyeol, Cai, Linda, Shetty, Chirag, Srivastava, Pallavi, Jiang, Jintao, Ke, Xiaolan, Meng, Yitao, Xie, Cong, Gupta, Indranil
Machine Learning graphs (or models) can be challenging or impossible to train when either devices have limited memory, or models are large. To split the model across devices, learning-based approaches are still popular. While these result in model placements that train fast on data (i.e., low step times), learning-based model-parallelism is time-consuming, taking many hours or days to create a placement plan of operators on devices. We present the Baechi system, the first to adopt an algorithmic approach to the placement problem for running machine learning training graphs on small clusters of memory-constrained devices. We integrate our implementation of Baechi into two popular open-source learning frameworks: TensorFlow and PyTorch. Our experimental results using GPUs show that: (i) Baechi generates placement plans 654 X - 206K X faster than state-of-the-art learning-based approaches, and (ii) Baechi-placed model's step (training) time is comparable to expert placements in PyTorch, and only up to 6.2% worse than expert placements in TensorFlow. We prove mathematically that our two algorithms are within a constant factor of the optimal. Our work shows that compared to learning-based approaches, algorithmic approaches can face different challenges for adaptation to Machine learning systems, but also they offer proven bounds, and significant performance benefits.
Machine Translation for Accessible Multi-Language Text Analysis
Chew, Edward W., Weisman, William D., Huang, Jingying, Frey, Seth
English is the international standard of social research, but scholars are increasingly conscious of their responsibility to meet the need for scholarly insight into communication processes globally. This tension is as true in computational methods as any other area, with revolutionary advances in the tools for English language texts leaving most other languages far behind. In this paper, we aim to leverage those very advances to demonstrate that multi-language analysis is currently accessible to all computational scholars. We show that English-trained measures computed after translation to English have adequate-to-excellent accuracy compared to source-language measures computed on original texts. We show this for three major analytics -- sentiment analysis, topic analysis, and word embeddings -- over 16 languages, including Spanish, Chinese, Hindi, and Arabic. We validate this claim by comparing predictions on original language tweets and their backtranslations: double translations from their source language to English and back to the source language. Overall, our results suggest that Google Translate, a simple and widely accessible tool, is effective in preserving semantic content across languages and methods. Modern machine translation can thus help computational scholars make more inclusive and general claims about human communication.
Improving Machine Translation with Phrase Pair Injection and Corpus Filtering
Batheja, Akshay, Bhattacharyya, Pushpak
In this paper, we show that the combination of Phrase Pair Injection and Corpus Filtering boosts the performance of Neural Machine Translation (NMT) systems. We extract parallel phrases and sentences from the pseudo-parallel corpus and augment it with the parallel corpus to train the NMT models. With the proposed approach, we observe an improvement in the Machine Translation (MT) system for 3 low-resource language pairs, Hindi-Marathi, English-Marathi, and English-Pashto, and 6 translation directions by up to 2.7 BLEU points, on the FLORES test data. These BLEU score improvements are over the models trained using the whole pseudo-parallel corpus augmented with the parallel corpus.
Self-Training Vision Language BERTs with a Unified Conditional Model
Yang, Xiaofeng, Lv, Fengmao, Liu, Fayao, Lin, Guosheng
Abstract--Natural language BERTs are trained with language corpus in a self-supervised manner. An example of generated image descriptions. Given different condition flags, our proposed UCM model is able to generate diverse image descriptions, such as COCO caption, dense caption, and questions. It's clear that the generated contents have different styles. Large scale pretraining has become the dominating approach in various natural language processing tasks. The success of large scale pretraining is due to a large amount of language setting. Although these models can be finetuned to perform training data available everywhere and the self-training algorithm. In this paper, we Second, current common practice in vision language BERT propose a self-training approach that allows to pretrain VL-pretraining uses various image descriptions to train, such as BERTs using unlabeled image data. Those image Self-training is usually done by iterating the following three descriptions have significant differences, making it difficult for steps: 1) training with labeled data, 2) generating pseudo labels an unconditional model to learn to generate adequate pseudo for unlabeled data, 3) mixing the labeled data and unlabeled captions for unlabeled images. However, the has shown its effectiveness in various tasks [4], [5], how to self-training of vision language BERTs is nontrivial due to use it effectively in training vision language BERTs is not yet the following reasons. First, although auto-encoding models studied.
JCSE: Contrastive Learning of Japanese Sentence Embeddings and Its Applications
Chen, Zihao, Handa, Hisashi, Shirahama, Kimiaki
Contrastive learning is widely used for sentence representation learning. Despite this prevalence, most studies have focused exclusively on English and few concern domain adaptation for domain-specific downstream tasks, especially for low-resource languages like Japanese, which are characterized by insufficient target domain data and the lack of a proper training strategy. To overcome this, we propose a novel Japanese sentence representation framework, JCSE (derived from ``Contrastive learning of Sentence Embeddings for Japanese''), that creates training data by generating sentences and synthesizing them with sentences available in a target domain. Specifically, a pre-trained data generator is finetuned to a target domain using our collected corpus. It is then used to generate contradictory sentence pairs that are used in contrastive learning for adapting a Japanese language model to a specific task in the target domain. Another problem of Japanese sentence representation learning is the difficulty of evaluating existing embedding methods due to the lack of benchmark datasets. Thus, we establish a comprehensive Japanese Semantic Textual Similarity (STS) benchmark on which various embedding models are evaluated. Based on this benchmark result, multiple embedding methods are chosen and compared with JCSE on two domain-specific tasks, STS in a clinical domain and information retrieval in an educational domain. The results show that JCSE achieves significant performance improvement surpassing direct transfer and other training strategies. This empirically demonstrates JCSE's effectiveness and practicability for downstream tasks of a low-resource language.
Language Embeddings Sometimes Contain Typological Generalizations
Östling, Robert, Kurfalı, Murathan
To what extent can neural network models learn generalizations about language structure, and how do we find out what they have learned? We explore these questions by training neural models for a range of natural language processing tasks on a massively multilingual dataset of Bible translations in 1295 languages. The learned language representations are then compared to existing typological databases as well as to a novel set of quantitative syntactic and morphological features obtained through annotation projection. We conclude that some generalizations are surprisingly close to traditional features from linguistic typology, but that most of our models, as well as those of previous work, do not appear to have made linguistically meaningful generalizations. Careful attention to details in the evaluation turns out to be essential to avoid false positives. Furthermore, to encourage continued work in this field, we release several resources covering most or all of the languages in our data: (i) multiple sets of language representations, (ii) multilingual word embeddings, (iii) projected and predicted syntactic and morphological features, (iv) software to provide linguistically sound evaluations of language representations.
Prompting Large Language Model for Machine Translation: A Case Study
Zhang, Biao, Haddow, Barry, Birch, Alexandra
Research on prompting has shown excellent performance with little or even no supervised training across many tasks. However, prompting for machine translation is still under-explored in the literature. We fill this gap by offering a systematic study on prompting strategies for translation, examining various factors for prompt template and demonstration example selection. We further explore the use of monolingual data and the feasibility of cross-lingual, cross-domain, and sentence-to-document transfer learning in prompting. Extensive experiments with GLM-130B (Zeng et al., 2022) as the testbed show that 1) the number and the quality of prompt examples matter, where using suboptimal examples degenerates translation; 2) several features of prompt examples, such as semantic similarity, show significant Spearman correlation with their prompting performance; yet, none of the correlations are strong enough; 3) using pseudo parallel prompt examples constructed from monolingual data via zero-shot prompting could improve translation; and 4) improved performance is achievable by transferring knowledge from prompt examples selected in other settings. We finally provide an analysis on the model outputs and discuss several problems that prompting still suffers from.
HanoiT: Enhancing Context-aware Translation via Selective Context
Yang, Jian, Yin, Yuwei, Ma, Shuming, Yang, Liqun, Guo, Hongcheng, Huang, Haoyang, Zhang, Dongdong, Zeng, Yutao, Li, Zhoujun, Wei, Furu
Context-aware neural machine translation aims to use the document-level context to improve translation quality. However, not all words in the context are helpful. The irrelevant or trivial words may bring some noise and distract the model from learning the relationship between the current sentence and the auxiliary context. To mitigate this problem, we propose a novel end-to-end encoder-decoder model with a layer-wise selection mechanism to sift and refine the long document context. To verify the effectiveness of our method, extensive experiments and extra quantitative analysis are conducted on four document-level machine translation benchmarks. The experimental results demonstrate that our model significantly outperforms previous models on all datasets via the soft selection mechanism.
Learning a Formality-Aware Japanese Sentence Representation
Xinyuan, Henry Li, Lee, Ray, Chen, Jerry, Marchisio, Kelly
While the way intermediate representations are generated in encoder-decoder sequence-to-sequence models typically allow them to preserve the semantics of the input sentence, input features such as formality might be left out. On the other hand, downstream tasks such as translation would benefit from working with a sentence representation that preserves formality in addition to semantics, so as to generate sentences with the appropriate level of social formality -- the difference between speaking to a friend versus speaking with a supervisor. We propose a sequence-to-sequence method for learning a formality-aware representation for Japanese sentences, where sentence generation is conditioned on both the original representation of the input sentence, and a side constraint which guides the sentence representation towards preserving formality information. Additionally, we propose augmenting the sentence representation with a learned representation of formality which facilitates the extraction of formality in downstream tasks. We address the lack of formality-annotated parallel data by adapting previous works on procedural formality classification of Japanese sentences. Experimental results suggest that our techniques not only helps the decoder recover the formality of the input sentence, but also slightly improves the preservation of input sentence semantics.