Machine Translation
Unsupervised Neural Dialect Translation with Commonality and Diversity Modeling
Wan, Yu, Yang, Baosong, Wong, Derek F., Chao, Lidia S., Du, Haihua, Ao, Ben C. H.
As a special machine translation task, dialect translation has two main characteristics: 1) lack of parallel training corpus; and 2) possessing similar grammar between two sides of the translation. In this paper, we investigate how to exploit the commonality and diversity between dialects thus to build unsupervised translation models merely accessing to monolingual data. Specifically, we leverage pivot-private embedding, layer coordination, as well as parameter sharing to sufficiently model commonality and diversity among source and target, ranging from lexical, through syntactic, to semantic levels. In order to examine the effectiveness of the proposed models, we collect 20 million monolingual corpus for each of Mandarin and Cantonese, which are official language and the most widely used dialect in China. Experimental results reveal that our methods outperform rule-based simplified and traditional Chinese conversion and conventional unsupervised translation models over 12 BLEU scores.
The quest for better training data
American localization specialist Lionbridge Technologies has been employing machine translation tools for many years. Eventually, its customers started asking for multilingual training data. Today, Lionbridge has a separate division entirely dedicated to AI, doing everything from collection of chatbot training data to image annotation, audio transcription and even multilingual content moderation services. To find out more about the work of the division, AI Business talked to Aristotelis Kostopoulos, vice president of product solutions, artificial intelligence at Lionbridge. Q: The AI division at Lionbridge grew out of the machine translation business, but today it does so much more.
Conversations at High Altitude - Inside GTS Amsterdam - Welocalize
At a height of 100m up the amazing A'DAM Tower in central Amsterdam, the altitude wasn't a problem at Global Transformation Summit (GTS) but keeping up with the many shared experiences and fast exchange of ideas was! GTS Amsterdam brought together global brands, connecting international business leaders and senior marketing and localization professionals. What was the common ground? Many insights shared and new contacts made. As content types and volumes continue to increase โ the growth of content on the internet doubles every 18 months โ brands need to converge content, collaborate internally, and ensure the customer experience is consistent and personal, to stand out from online competition. This means re-imagining how we work โ looking to define how multilingual content performs beyond traditional KPIs.
NVIDIA/OpenSeq2Seq
OpenSeq2Seq main goal is to allow researchers to most effectively explore various sequence-to-sequence models. The efficiency is achieved by fully supporting distributed and mixed-precision training. OpenSeq2Seq is built using TensorFlow and provides all the necessary building blocks for training encoder-decoder models for neural machine translation, automatic speech recognition, speech synthesis, and language modeling. Speech-to-text workflow uses some parts of Mozilla DeepSpeech project. Beam search decoder with language model re-scoring implementation (in decoders) is based on Baidu DeepSpeech.
Your Brief Guide to Natural Language Processing (Part 1)
In recent years, natural language processing (NLP) has become a part of our everyday lives. Smartphones now come equipped with NLP-powered voice assistants that interpret and understand human speech in order to provide relevant responses to user queries. NLP also helps translation apps break down communication barriers by analyzing input in one language and transforming it into another language. Even word processors rely on NLP to check the grammar, logic, and syntax of written input. And NLP is now an integral part of customer service; it's used to guide people to the right representative through verbal commands. Yet, few people actually understand how NLP plays a role in making them possible.
Cost-Sensitive Training for Autoregressive Models
Saparina, Irina, Osokin, Anton
Training autoregressive models to better predict under the test metric, instead of maximizing the likelihood, has been reported to be beneficial in several use cases but brings additional complications, which prevent wider adoption. In this paper, we follow the learning-to-search approach (Daum\'e III et al., 2009; Leblond et al., 2018) and investigate its several components. First, we propose a way to construct a reference policy based on an alignment between the model output and ground truth. Our reference policy is optimal when applied to the Kendall-tau distance between permutations (appear in the task of word ordering) and helps when working with the METEOR score for machine translation. Second, we observe that the learning-to-search approach benefits from choosing the costs related to the test metrics. Finally, we study the effect of different learning objectives and find that the standard KL loss only learns several high-probability tokens and can be replaced with ranking objectives that target these tokens explicitly.
Re-Translation Strategies For Long Form, Simultaneous, Spoken Language Translation
Arivazhagan, Naveen, Cherry, Colin, I, Te, Macherey, Wolfgang, Baljekar, Pallavi, Foster, George
We investigate the problem of simultaneous machine translation of long-form speech content. We target a continuous speech-to-text scenario, generating translated captions for a live audio feed, such as a lecture or play-by-play commentary. As this scenario allows for revisions to our incremental translations, we adopt a re-translation approach to simultaneous translation, where the source is repeatedly translated from scratch as it grows. This approach naturally exhibits very low latency and high final quality, but at the cost of incremental instability as the output is continuously refined. We experiment with a pipeline of industry-grade speech recognition and translation tools, augmented with simple inference heuristics to improve stability. We use TED Talks as a source of multilingual test data, developing our techniques on English-to-German spoken language translation. Our minimalist approach to simultaneous translation allows us to easily scale our final evaluation to six more target languages, dramatically improving incremental stability for all of them.
Reinforcement Learning Upside Down: Don't Predict Rewards -- Just Map Them to Actions
We transform reinforcement learning (RL) into a form of supervised learning (SL) by turning traditional RL on its head, calling this Upside Down RL (UDRL). Standard RL predicts rewards, while UDRL instead uses rewards as task-defining inputs, together with representations of time horizons and other computable functions of historic and desired future data. UDRL learns to interpret these input observations as commands, mapping them to actions (or action probabilities) through SL on past (possibly accidental) experience. UDRL generalizes to achieve high rewards or other goals, through input commands such as: get lots of reward within at most so much time! A separate paper [61] on first experiments with UDRL shows that even a pilot version of UDRL can outperform traditional baseline algorithms on certain challenging RL problems. We also introduce a related simple but general approach for teaching a robot to imitate humans. First videotape humans imitating the robot's current behaviors, then let the robot learn through SL to map the videos (as input commands) to these behaviors, then let it generalize and imitate videos of humans executing previously unknown behavior. This Imitate-Imitator concept may actually explain why biological evolution has resulted in parents who imitate the babbling of their babies.