Machine Translation
What is Deep Learning Getting Started With Deep Learning Edureka
We all know that Google can instantly translate between 100 different human language, that too very quickly as if by magic. The technology behind Google Translate is called Machine Translation and has been savior for people who can't communicate with each other because of the difference in the speaking language. Now, you would be thinking that this feature has been there for a long time, so, what's new in this? Let me tell you that over the past two years, with the help of deep learning, Google has totally reformed the approach to machine translation in its Google Translate. In fact, deep learning researchers who know almost nothing about language translation are putting forward relatively simple machine learning solutions that are beating the best expert-built language translation systems in the world.
Improving the Performance of Online Neural Transducer Models
Sainath, Tara N., Chiu, Chung-Cheng, Prabhavalkar, Rohit, Kannan, Anjuli, Wu, Yonghui, Nguyen, Patrick, Chen, Zhifeng
ABSTRACT Having a sequence-to-sequence model which can operate in an online fashion is important for streaming applications such as Voice Search. Neural transducer is a streaming sequence-to-sequence model, but has shown a significant degradation in performance compared to nonstreaming models such as Listen, Attend and Spell (LAS). Specifically, we look at increasing the window over which NT computes attention, mainly by looking backwards in time so the model still remains online. In addition, we explore initializing a NT model from a LAS-trained model so that it is guided with a better alignment. Finally, we explore including stronger language models such as using wordpiece models, and applying an external LM during the beam search. On a Voice Search task, we find with these improvements we can get NT to match the performance of LAS. 1. INTRODUCTION Sequence-to-sequence models have become popular in the automatic speech recognition (ASR) community [1, 2, 3, 4], as they allow for one neural network to jointly learn an acoutic, pronunciation and language model, greatly simplifying the ASR pipeline.
For The First Time, AI Can Teach Itself Any Language On Earth
To understand the potential of these new systems, it helps to know how current machine translation works. The current de facto standard is Google Translate, a system that covers 103 languages from Afrikaans to Zulu, including the top 10 languages in the worldโin order, Mandarin, Spanish, English, Hindi, Bengali, Portuguese, Russian, Japanese, German, and Javanese. Google's system uses human-supervised neural networks that compare parallel textsโbooks and articles that have been previously translated by humans. By comparing extremely large amounts of these parallel texts, Google Translate learns the equivalences between any two given languages, thus acquiring the ability to quickly translate between them. Sometimes the translations are funny or don't really capture the original meaning but, in general, they are functional and, overtime, they're getting better and better.
Is Google Translate sexist?
Several users have taken to Twitter to complain about Google's sexist translations in its Translate tool. When translating phrases from gender-neutral languages including Turkish and Finnish, users noticed that Google gave male pronouns to certain professions, such as police, engineer and leader. In contrast, female pronouns were given to jobs including secretary, nanny and nurse. The reason for this bias remains unclear, and Google is yet to respond to requests for comment. Google Translate's automated service can translate over 100 languages.
Artificial intelligence goes bilingual--without a dictionary
Computers might soon translate between many more languages. Automatic language translation has come a long way, thanks to neural networks--computer algorithms that take inspiration from the human brain. But training such networks requires an enormous amount of data: millions of sentence-by-sentence translations to demonstrate how a human would do it. Now, two new papers show that neural networks can learn to translate with no parallel texts--a surprising advance that could make documents in many languages more accessible. "Imagine that you give one person lots of Chinese books and lots of Arabic books--none of them overlapping--and the person has to learn to translate Chinese to Arabic. That seems impossible, right?" says the first author of one study, Mikel Artetxe, a computer scientist at the University of the Basque Country (UPV) in San Sebastiร n, Spain.
Artificial intelligence goes bilingual--without a dictionary
Computers might soon translate between many more languages. Automatic language translation has come a long way, thanks to neural networks--computer algorithms that take inspiration from the human brain. But training such networks requires an enormous amount of data: millions of sentence-by-sentence translations to demonstrate how a human would do it. Now, two new papers show that neural networks can learn to translate with no parallel texts--a surprising advance that could make documents in many languages more accessible. "Imagine that you give one person lots of Chinese books and lots of Arabic books--none of them overlapping--and the person has to learn to translate Chinese to Arabic. That seems impossible, right?" says the first author of one study, Mikel Artetxe, a computer scientist at the University of the Basque Country (UPV) in San Sebastiร n, Spain.
AI's sharing economy: Microsoft creates publicly available datasets
Samira Ebrahimi Kahou and her colleagues at Microsoft Research Maluuba recently set out to solve an interesting research problem: How could they use artificial intelligence to correctly reason about information found in graphs and pie charts? One big obstacle, they discovered, was that the research area was so new that there weren't any existing datasets available for them to test their hypotheses. The FigureQA dataset, which the team released publicly earlier this fall, is one of a number of datasets, metrics and other tools for testing AI systems that Microsoft researchers and engineers have created and shared in recent years. Researchers all over the world use them to see how well their AI systems do at everything from translating conversational speech to predicting the next word a person may want to type. The teams say these tools provide a codified way for everyone from academic researchers to industry experts to test their systems, compare their work and learn from each other.
Investors Shovel Millions into Natural Language Processing Slator
Among the vast business applications of artificial intelligence, Slator has been keeping a close eye on neural machine translation (MT). However, the boundaries between MT and broader tech like natural language processing (NLP) are sometimes fuzzy. The services resulting from these technologies are often adjacent: translation on one side and chatbots on another. In fact, some companies combine them into a single service--multilingual chatbots, for instance. This is why a recent slew of significant funding rounds in the NLP space has caught our attention. In June 2017, Italy-based venture incubator H-Farm acquired language technology services provider CELI, also headquartered in Italy, in a leveraged buyout of 100% of its shares.
Word embeddings in 2017: Trends and future directions
The word2vec method based on skip-gram with negative sampling (Mikolov et al., 2013) [49] was published in 2013 and had a large impact on the field, mainly through its accompanying software package, which enabled efficient training of dense word representations and a straightforward integration into downstream models. In some respects, we have come far since then: Word embeddings have established themselves as an integral part of Natural Language Processing (NLP) models. In other aspects, we might as well be in 2013 as we have not found ways to pre-train word embeddings that have managed to supersede the original word2vec. This post will focus on the deficiencies of word embeddings and how recent approaches have tried to resolve them. If not otherwise stated, this post discusses pre-trained word embeddings, i.e. word representations that have been learned on a large corpus using word2vec and its variants.