Goto

Collaborating Authors

 Machine Translation


$\bar{G}_{mst}$:An Unbiased Stratified Statistic and a Fast Gradient Optimization Algorithm Based on It

arXiv.org Machine Learning

It is difficult to optimize a giant model with deep and wider layers. Similar to most optimization algorithms, training a deep model with gradient method (SGD-like Algorithms) has disadvantages such as easy to fall into local minima or saddle point and slow convergence speed. There have been a lot of researches on the improvement of the gradient method, and a considerable part of these researches focus on how to refine the search direction while keeping the iteration cost as low as possible to accelerate the convergence of the algorithm[10, 11, 12, 13, 14, 15, 16]. These improvements for the search direction are roughly divided into two categories. One is the momentum method[11] based on the principles of physics and the corresponding improved algorithms[12, 20, 21], the momentum method avoids excessive swing amplitude of the search track by retaining part of the potential energy of the original track to accelerate the convergence.


Beam Search with Bidirectional Strategies for Neural Response Generation

arXiv.org Artificial Intelligence

Sequence-to-sequence neural networks have been widely used in language-based applications as they have flexible capabilities to learn various language models. However, when seeking for the optimal language response through trained neural networks, current existing approaches such as beam-search decoder strategies are still not able reaching to promising performances. Instead of developing various decoder strategies based on a "regular sentence order" neural network (a trained model by outputting sentences from left-to-right order), we leveraged "reverse" order as additional language model (a trained model by outputting sentences from right-to-left order) which can provide different perspectives for the path finding problems. In this paper, we propose bidirectional strategies in searching paths by combining two networks (left-to-right and right-to-left language models) making a bidirectional beam search possible. Besides, our solution allows us using any similarity measure in our sentence selection criterion. Our approaches demonstrate better performance compared to the unidirectional beam search strategy.


Pretrained Language Models are Symbolic Mathematics Solvers too!

arXiv.org Machine Learning

Solving symbolic mathematics has always been of in the arena of human ingenuity that needs compositional reasoning and recurrence. However, recent studies have shown that large-scale language models such as transformers are universal and surprisingly can be trained as a sequence-to-sequence task to solve complex mathematical equations. These large transformer models need humongous amounts of training data to generalize to unseen symbolic mathematics problems. In this paper, we present a sample efficient way of solving the symbolic tasks by first pretraining the transformer model with language translation and then fine-tuning the pretrained transformer model to solve the downstream task of symbolic mathematics. We achieve comparable accuracy on the integration task with our pretrained model while using around $1.5$ orders of magnitude less number of training samples with respect to the state-of-the-art deep learning for symbolic mathematics. The test accuracy on differential equation tasks is considerably lower comparing with integration as they need higher order recursions that are not present in language translations. We pretrain our model with different pairs of language translations. Our results show language bias in solving symbolic mathematics tasks. Finally, we study the robustness of the fine-tuned model on symbolic math tasks against distribution shift, and our approach generalizes better in distribution shift scenarios for the function integration.


The Low-Resource Double Bind: An Empirical Study of Pruning for Low-Resource Machine Translation

arXiv.org Artificial Intelligence

A "bigger is better" explosion in the number of parameters in deep neural networks has made it increasingly challenging to make state-of-the-art networks accessible in compute-restricted environments. Compression techniques have taken on renewed importance as a way to bridge the gap. However, evaluation of the trade-offs incurred by popular compression techniques has been centered on high-resource datasets. In this work, we instead consider the impact of compression in a data-limited regime. We introduce the term low-resource double bind to refer to the co-occurrence of data limitations and compute resource constraints. This is a common setting for NLP for low-resource languages, yet the trade-offs in performance are poorly studied. Our work offers surprising insights into the relationship between capacity and generalization in data-limited regimes for the task of machine translation. Our experiments on magnitude pruning for translations from English into Yoruba, Hausa, Igbo and German show that in low-resource regimes, sparsity preserves performance on frequent sentences but has a disparate impact on infrequent ones. However, it improves robustness to out-of-distribution shifts, especially for datasets that are very distinct from the training distribution. Our findings suggest that sparsity can play a beneficial role at curbing memorization of low frequency attributes, and therefore offers a promising solution to the low-resource double bind.


Concept of Text Summarization

#artificialintelligence

This technique, unlike extraction, relies on being able to paraphrase and shorten parts of a document using advanced natural language techniques. Abstractive summarization methods aim at producing summary by interpreting the text using advanced natural language techniques in order to generate a new shorter text -- parts of which may not appear as part of the original document, that conveys the most critical information from the original text, requiring rephrasing sentences and incorporating information from full text to generate summaries such as a human-written abstract usually does. In fact, an acceptable abstractive summary covers core information in the input and is linguistically fluent. Abstractive methods take advantage of recent developments in deep learning. Since it can be regarded as a sequence mapping task where the source text should be mapped to the target summary, abstractive methods take advantage of the recent success of the sequence to sequence models. These models consist of an encoder and a decoder, where a neural network reads the text, encodes it, and then generates target text. In general, building abstract summaries is a challenging task, which is relatively harder than data-driven approaches such as sentence extraction and involves complex language modeling. Thus, they are still far away from reaching human-level quality in summary generation, despite recent progress using neural networks inspired by the progress of neural machine translation and sequence to sequence models. The benefits of Automatic Text Summarization go beyond solving apparent problems.


Improving Zero-shot Multilingual Neural Machine Translation for Low-Resource Languages

arXiv.org Artificial Intelligence

Although the multilingual Neural Machine Translation(NMT), which extends Google's multilingual NMT, has ability to perform zero-shot translation and the iterative self-learning algorithm can improve the quality of zero-shot translation, it confronts with two problems: the multilingual NMT model is prone to generate wrong target language when implementing zero-shot translation; the self-learning algorithm, which uses beam search to generate synthetic parallel data, demolishes the diversity of the generated source language and amplifies the impact of the same noise during the iterative learning process. In this paper, we propose the tagged-multilingual NMT model and improve the self-learning algorithm to handle these two problems. Firstly, we extend the Google's multilingual NMT model and add target tokens to the target languages, which associates the start tag with the target language to ensure that the source language can be translated to the required target language. Secondly, we improve the self-learning algorithm by replacing beam search with random sample to increases the diversity of the generated data and makes it properly cover the true data distribution. Experimental results on IWSLT show that the adjusted tagged-multilingual NMT separately obtains 9.41 and 7.85 BLEU scores over the multilingual NMT on 2010 and 2017 Romanian-Italian test sets. Similarly, it obtains 9.08 and 7.99 BLEU scores on Italian-Romanian zero-shot translation. Furthermore, the improved self-learning algorithm shows its superiorities over the conventional self-learning algorithm on zero-shot translations.


Translating from Morphologically Complex Languages: A Paraphrase-Based Approach

arXiv.org Artificial Intelligence

We propose a novel approach to translating from a morphologically complex language. Unlike previous research, which has targeted word inflections and concatenations, we focus on the pairwise relationship between morphologically related words, which we treat as potential paraphrases and handle using paraphrasing techniques at the word, phrase, and sentence level. An important advantage of this framework is that it can cope with derivational morphology, which has so far remained largely beyond the capabilities of statistical machine translation systems. Our experiments translating from Malay, whose morphology is mostly derivational, into English show significant improvements over rivaling approaches based on five automatic evaluation measures (for 320,000 sentence pairs; 9.5 million English word tokens).


Analyzing the Use of Character-Level Translation with Sparse and Noisy Datasets

arXiv.org Artificial Intelligence

This paper provides an analysis of character-level machine translation models used in pivot-based translation when applied to sparse and noisy datasets, such as crowdsourced movie subtitles. In our experiments, we find that such character-level models cut the number of untranslated words by over 40% and are especially competitive (improvements of 2-3 BLEU points) in the case of limited training data. We explore the impact of character alignment, phrase table filtering, bitext size and the choice of pivot language on translation quality. We further compare cascaded translation models to the use of synthetic training data via multiple pivots, and we find that the latter works significantly better. Finally, we demonstrate that neither word-nor character-BLEU correlate perfectly with human judgments, due to BLEU's sensitivity to length.


Improved statistical machine translation using monolingual paraphrases

arXiv.org Artificial Intelligence

We propose a novel monolingual sentence paraphrasing method for augmenting the training data for statistical machine translation systems "for free" -- by creating it from data that is already available rather than having to create more aligned data. Starting with a syntactic tree, we recursively generate new sentence variants where noun compounds are paraphrased using suitable prepositions, and vice-versa -- preposition-containing noun phrases are turned into noun compounds. The evaluation shows an improvement equivalent to 33%-50% of that of doubling the amount of training data.


GERNERMED -- An Open German Medical NER Model

arXiv.org Artificial Intelligence

The current state of adoption of well-structured electronic health records and integration of digital methods for storing medical patient data in structured formats can often considered as inferior compared to the use of traditional, unstructured text based patient data documentation. Data mining in the field of medical data analysis often needs to rely solely on processing of unstructured data to retrieve relevant data. In natural language processing (NLP), statistical models have been shown successful in various tasks like part-of-speech tagging, relation extraction (RE) and named entity recognition (NER). In this work, we present GERNERMED, the first open, neural NLP model for NER tasks dedicated to detect medical entity types in German text data. Here, we avoid the conflicting goals of protection of sensitive patient data from training data extraction and the publication of the statistical model weights by training our model on a custom dataset that was translated from publicly available datasets in foreign language by a pretrained neural machine translation model. The sample code and the statistical model is available at: https://github.com/frankkramer-lab/GERNERMED