Goto

Collaborating Authors

 Machine Translation


The Impact of Annotation Guidelines and Annotated Data on Extracting App Features from App Reviews

arXiv.org Machine Learning

Annotation guidelines used to guide the annotation of training and evaluation datasets can have a considerable impact on the quality of machine learning models. In this study, we explore the effects of annotation guidelines on the quality of app feature extraction models. As a main result, we propose several changes to the existing annotation guidelines with a goal of making the extracted app features more useful and informative to the app developers. We test the proposed changes via simulating the application of the new annotation guidelines and then evaluating the performance of the supervised machine learning models trained on datasets annotated with initial and simulated guidelines. While the overall performance of automatic app feature extraction remains the same as compared to the model trained on the dataset with initial annotations, the features extracted by the model trained on the dataset with simulated new annotations are less noisy and more informative to the app developers. Secondly, we are interested in what kind of annotated training data is necessary for training an automatic app feature extraction model. In particular, we explore whether the training set should contain annotated app reviews from those apps/app categories on which the model is subsequently planned to be applied, or is it sufficient to have annotated app reviews from any app available for training, even when these apps are from very different categories compared to the test app. Our experiments show that having annotated training reviews from the test app is not necessary although including them into training set helps to improve recall. Furthermore, we test whether augmenting the training set with annotated product reviews helps to improve the performance of app feature extraction. We find that the models trained on augmented training set lead to improved recall but at the cost of the drop in precision.


Exploring the Use of Attention within an Neural Machine Translation Decoder States to Translate Idioms

arXiv.org Machine Learning

Idioms pose problems to almost all Machine Translation systems. This type of language is very frequent in day-to-day language use and cannot be simply ignored. The recent interest in memory augmented models in the field of Language Modelling has aided the systems to achieve good results by bridging long-distance dependencies. In this paper we explore the use of such techniques into a Neural Machine Translation system to help in translation of idiomatic language.


End-to-End Content and Plan Selection for Data-to-Text Generation

arXiv.org Artificial Intelligence

Learning to generate fluent natural language from structured data with neural networks has become an common approach for NLG. This problem can be challenging when the form of the structured data varies between examples. This paper presents a survey of several extensions to sequence-to-sequence models to account for the latent content selection process, particularly variants of copy attention and coverage decoding. We further propose a training method based on diverse ensembling to encourage models to learn distinct sentence templates during training. An empirical evaluation of these techniques shows an increase in the quality of generated text across five automated metrics, as well as human evaluation.


Understanding the Origins of Bias in Word Embeddings

arXiv.org Machine Learning

The power of machine learning systems not only promises great technical progress, but risks societal harm. As a recent example, researchers have shown that popular word embedding algorithms exhibit stereotypical biases, such as gender bias. The widespread use of these algorithms in machine learning systems, from automated translation services to curriculum vitae scanners, can amplify stereotypes in important contexts. Although methods have been developed to measure these biases and alter word embeddings to mitigate their biased representations, there is a lack of understanding in how word embedding bias depends on the training data. In this work, we develop a technique for understanding the origins of bias in word embeddings. Given a word embedding trained on a corpus, our method identifies how perturbing the corpus will affect the bias of the resulting embedding. This can be used to trace the origins of word embedding bias back to the original training documents. Using our method, one can investigate trends in the bias of the underlying corpus and identify subsets of documents whose removal would most reduce bias. We demonstrate our techniques on both a New York Times and Wikipedia corpus and find that our influence function-based approximations are extremely accurate.


Google Translate for iOS can speak in your local accent

Engadget

Until now, using Google Translate on your iPhone has meant listening to the same pronunciation for translations no matter where you live. That's not very considerate, and potentially a problem if you live in countries where foreign accents could make comprehension difficult. You won't have that issue from now on -- an update to Google Translate has added speech output in local versions of multiple languages, including English, Bengali, French and Spanish. You can hear English results with an Indian accent, for instance, or listen to French with a Canadian spin. Android has included these speech options for a while.


Optimal Completion Distillation for Sequence Learning

arXiv.org Machine Learning

We present Optimal Completion Distillation (OCD), a training procedure for optimizing sequence to sequence models based on edit distance. OCD is efficient, has no hyper-parameters of its own, and does not require pretraining or joint optimization with conditional log-likelihood. Given a partial sequence generated by the model, we first identify the set of optimal suffixes that minimize the total edit distance, using an efficient dynamic programming algorithm. Then, for each position of the generated sequence, we use a target distribution that puts equal probability on the first token of all the optimal suffixes. OCD achieves the state-of-the-art performance on end-to-end speech recognition, on both Wall Street Journal and Librispeech datasets, achieving $9.3\%$ WER and $4.5\%$ WER respectively.


AI Translation: Latest Trends - Text United

#artificialintelligence

It is not out of reason to boldly say that translation is of great importance to man. The diversity of languages and cultures in the world makes translation essential to humanity. The benefits of translation to humankind spread across businesses, politics, international relations, tourism, and education. Any company can go global. Moreover, the secret of a successful international business lies in quality translation services.


IncSQL: Training Incremental Text-to-SQL Parsers with Non-Deterministic Oracles

arXiv.org Artificial Intelligence

We present a sequence-to-action parsing approach for the natural language to SQL task that incrementally fills the slots of a SQL query with feasible actions from a pre-defined inventory. To account for the fact that typically there are multiple correct SQL queries with the same or very similar semantics, we draw inspiration from syntactic parsing techniques and propose to train our sequence-to-action models with non-deterministic oracles. We evaluate our models on the WikiSQL dataset and achieve an execution accuracy of 83.7% on the test set, a 2.1% absolute improvement over the models trained with traditional static oracles assuming a single correct target SQL query. When further combined with the execution-guided decoding strategy, our model sets a new state-of-the-art performance at an execution accuracy of 87.1%.


Is Neural Machine Translation Ready for Marketing Content?

#artificialintelligence

Music fans were the first to prove this by making a laughingstock of the app by loading lyrics from songs like Will Smith's "Fresh Prince of Bel-Air" and the theme song from Moana to see what funny or ridiculous translations Google would generate. While the tool isn't nearly as bad as videos make it out to be, this negative PR has kept companies from using it. After all, if Google can't translate song lyrics correctly, why would you trust it with marketing content? But Google Translate doesn't represent all machine translation. However, it is a brand that happens to be well-known and free.


SwiftKey for Android now offers real-time message translation

Engadget

Microsoft has brought its Translator to SwiftKey, allowing users to translate their conversations without having to leave the app they're in. With an update out today, SwiftKey for Android will translate incoming and outgoing messages in real time and it will be able to do so for over 60 languages. Additionally, while you won't need to install Microsoft Translator to be able to use the new SwiftKey feature, the company says translation will work offline if you do. Microsoft purchased SwiftKey in 2016 and it only makes sense that it would merge it's translator with the smart keyboard. Android users can access the feature through SwiftKey's Toolbar -- just tap the plus sign in the upper left corner of the keyboard to get there -- and you can check out which languages are supported here.