Machine Translation
Cost-Sensitive Training for Autoregressive Models
Saparina, Irina, Osokin, Anton
Training autoregressive models to better predict under the test metric, instead of maximizing the likelihood, has been reported to be beneficial in several use cases but brings additional complications, which prevent wider adoption. In this paper, we follow the learning-to-search approach (Daum\'e III et al., 2009; Leblond et al., 2018) and investigate its several components. First, we propose a way to construct a reference policy based on an alignment between the model output and ground truth. Our reference policy is optimal when applied to the Kendall-tau distance between permutations (appear in the task of word ordering) and helps when working with the METEOR score for machine translation. Second, we observe that the learning-to-search approach benefits from choosing the costs related to the test metrics. Finally, we study the effect of different learning objectives and find that the standard KL loss only learns several high-probability tokens and can be replaced with ranking objectives that target these tokens explicitly.
Re-Translation Strategies For Long Form, Simultaneous, Spoken Language Translation
Arivazhagan, Naveen, Cherry, Colin, I, Te, Macherey, Wolfgang, Baljekar, Pallavi, Foster, George
We investigate the problem of simultaneous machine translation of long-form speech content. We target a continuous speech-to-text scenario, generating translated captions for a live audio feed, such as a lecture or play-by-play commentary. As this scenario allows for revisions to our incremental translations, we adopt a re-translation approach to simultaneous translation, where the source is repeatedly translated from scratch as it grows. This approach naturally exhibits very low latency and high final quality, but at the cost of incremental instability as the output is continuously refined. We experiment with a pipeline of industry-grade speech recognition and translation tools, augmented with simple inference heuristics to improve stability. We use TED Talks as a source of multilingual test data, developing our techniques on English-to-German spoken language translation. Our minimalist approach to simultaneous translation allows us to easily scale our final evaluation to six more target languages, dramatically improving incremental stability for all of them.
Reinforcement Learning Upside Down: Don't Predict Rewards -- Just Map Them to Actions
We transform reinforcement learning (RL) into a form of supervised learning (SL) by turning traditional RL on its head, calling this Upside Down RL (UDRL). Standard RL predicts rewards, while UDRL instead uses rewards as task-defining inputs, together with representations of time horizons and other computable functions of historic and desired future data. UDRL learns to interpret these input observations as commands, mapping them to actions (or action probabilities) through SL on past (possibly accidental) experience. UDRL generalizes to achieve high rewards or other goals, through input commands such as: get lots of reward within at most so much time! A separate paper [61] on first experiments with UDRL shows that even a pilot version of UDRL can outperform traditional baseline algorithms on certain challenging RL problems. We also introduce a related simple but general approach for teaching a robot to imitate humans. First videotape humans imitating the robot's current behaviors, then let the robot learn through SL to map the videos (as input commands) to these behaviors, then let it generalize and imitate videos of humans executing previously unknown behavior. This Imitate-Imitator concept may actually explain why biological evolution has resulted in parents who imitate the babbling of their babies.
Cross-Language Aphasia Detection using Optimal Transport Domain Adaptation
Balagopalan, Aparna, Novikova, Jekaterina, McDermott, Matthew B. A., Nestor, Bret, Naumann, Tristan, Ghassemi, Marzyeh
Multi-language speech datasets are scarce and often have small sample sizes in the medical domain. Robust transfer of linguistic features across languages could improve rates of early diagnosis and therapy for speakers of low-resource languages when detecting health conditions from speech. We utilize out-of-domain, unpaired, single-speaker, healthy speech data for training multiple Optimal Transport (OT) domain adaptation systems. We learn mappings from other languages to English and detect aphasia from linguistic characteristics of speech, and show that OT domain adaptation improves aphasia detection over unilingual baselines for French (6% increased F1) and Mandarin (5% increased F1). Further, we show that adding aphasic data to the domain adaptation system significantly increases performance for both French and Mandarin, increasing the F1 scores further (10% and 8% increase in F1 scores for French and Mandarin, respectively, over unilingual baselines).
The Shallowness of Google Translate
One Sunday, at one of our weekly salsa sessions, my friend Frank brought along a Danish guest. I knew Frank spoke Danish well, since his mother was Danish, and he, as a child, had lived in Denmark. As for his friend, her English was fluent, as is standard for Scandinavians. However, to my surprise, during the evening's chitchat it emerged that the two friends habitually exchanged emails using Google Translate. Frank would write a message in English, then run it through Google Translate to produce a new text in Danish; conversely, she would write a message in Danish, then let Google Translate anglicize it.
Move over, Google Translate: Here come A.I. earbuds
Forget phrase books or even Google Translate. New translation devices are getting closer to replicating the fantasy of the Babel fish, which in the "Hitchhiker's Guide to the Galaxy" sits in one's ear and instantly translates any foreign language into the user's own. The WT2 Plus Ear to Ear AI Translator Earbuds from Timekettle are already available, while the over-the-ear "Ambassador" from Wavery Labs is scheduled for release this year. Both brands are wireless, and come with two earpieces that must be synced to a single smartphone connected to Wi-Fi or cellular data. These devices "bring us a bit closer to being able to travel to places in the world where people speak different languages and communicate smoothly with those who are living there," said Graham Neubig, an assistant professor at the Language Technologies Institute of Carnegie Mellon University and an expert in machine learning and natural language processing.
AWS adds 22 new languages to Amazon Translate ZDNet
Amazon Translate, Amazon Web Service's real-time translation service, is getting an update with support for 22 new languages. The announcement comes a week ahead of the AWS re:Invent conference, where AWS will promote Translate and a slew of other AI-powered tools for its cloud customers. AWS on Monday also announced new services related to image recognition, voice-based UIs and IOT. What is AI? Everything you need to know about Artificial Intelligence Amazon Translate now supports a total of 54 languages and dialects, with 2,804 language pairs now supported. The neural machine translation service enables customers to easily translate information from one language to many.
Samsung Research Centers Around the World Take First Place in Prestigious AI Challenges
Samsung Electronics' Global Research & Development (R&D) Centers play a key part in developing artificial intelligence (AI) capabilities for real-world usage. A credit to the work this advanced R&D branch of Samsung undertakes, both Samsung R&D Institute Poland and Samsung Research America AI Center have recently won two prestigious global challenges. This year, Samsung R&D Institute Poland won first place in two categories, the first being text-to-text translation from English to Czech and the second – an end-to-end system translating English speech into German text. For the text-to-text translation category, researchers worked to develop a model to translate the transcript of a spoken English-language TED Talk into Czech. Developing their winning model required the Samsung team to develop large, filtered corpora from which to work and generate as much synthetic data as possible.
DiscoTK: Using Discourse Structure for Machine Translation Evaluation
Joty, Shafiq, Guzman, Francisco, Marquez, Lluis, Nakov, Preslav
We present novel automatic metrics for machine translation evaluation that use discourse structure and convolution kernels to compare the discourse tree of an automatic translation with that of the human reference. We experiment with five transformations and augmentations of a base discourse tree representation based on the rhetorical structure theory, and we combine the kernel scores for each of them into a single score. Finally, we add other metrics from the ASIYA MT evaluation toolkit, and we tune the weights of the combination on actual human judgments. Experiments on the WMT12 and WMT13 metrics shared task datasets show correlation with human judgments that outperforms what the best systems that participated in these years achieved, both at the segment and at the system level.
22 New Languages And Variants, 6 New Regions For Amazon Translate Amazon Web Services
Just a few weeks ago, I told you about 7 new languages supported by Amazon Translate, our fully managed service for machine translation. Well, here I am again, announcing no less than 22 new languages and variants, as well as 6 additional AWS Regions where Translate is now available. Introducing 22 New Languages And Variants That's what I call an update! In addition to existing languages, Translate now supports: Afrikaans, Albanian, Amharic, Azerbaijani, Bengali, Bosnian, Bulgarian, Croatian, Dari, Estonian, Canadian French, Georgian, Hausa, Latvian, Pashto, Serbian, Slovak, Slovenian, Somali, Swahili, Tagalog, and Tamil. Congratulations if you can name all countries and regions of origin: I couldn't!