Collaborating Authors

The first AI model that translates 100 languages without relying on English data


Facebook AI is introducing, M2M-100 the first multilingual machine translation (MMT) model that translates between any pair of 100 languages without relying on English data. When translating, say, Chinese to French, previous best multilingual models train on Chinese to English and English to French, because English training data is the most widely available. Our model directly trains on Chinese to French data to better preserve meaning. It outperforms English-centric systems by 10 points on the widely used BLEU metric for evaluating machine translations. M2M-100 is trained on a total of 2,200 language directions -- or 10x more than previous best, English-centric multilingual models.

Meta's machine translation journey


There are around 7000 languages spoken globally, but most translation models focus on English and other popular languages. This excludes a major part of the world from the benefit of having access to content, technologies and other advantages of being online. Tech giants are trying to bridge this gap. Just days back, Meta announced that it plans to bring out a Universal Speech Translator to translate speech from one language to another in real-time. This announcement is not surprising to anyone who follows the company closely. Meta has been devoted to bringing innovations in machine translations for quite some time now.

Azure AI empowers organizations to serve users in more than 100 languages


Microsoft announced today that 12 new languages and dialects have been added to Translator. These additions mean that the service can now translate between more than 100 languages and dialects, making information in text and documents accessible to 5.66 billion people worldwide. "One hundred languages is a good milestone for us to achieve our ambition for everyone to be able to communicate regardless of the language they speak," said Xuedong Huang, Microsoft technical fellow and Azure AI chief technology officer. Translator today covers the world's most spoken languages including English, Chinese, Hindi, Arabic and Spanish. In recent years, advances in AI technology have allowed the company to grow its language library with low-resource and endangered languages, such as Inuktitut, a dialect of Inuktut that is spoken by about 40,000 Inuit in Canada.

Teaching AI to translate 100s of spoken and written languages in real time


For people who understand languages like English, Mandarin, or Spanish, it may seem like today's apps and web tools already provide the translation technology we need. But billions of people are being left out -- unable to easily access most of the information on the internet or connect with most of the online world in their native language. Today's machine translation (MT) systems are improving rapidly, but they still rely heavily on learning from large amounts of textual data, so they do not generally work well for low-resource languages, i.e., languages that lack training data, and for languages that don't have a standardized writing system. Eliminating language barriers would be profound, making it possible for billions of people to access information online in their native or preferred languages. Advances in MT won't just help those people who don't speak one of the languages that dominates the internet today; they'll also fundamentally change the way people in the world connect and share ideas.

How Google is using emerging AI techniques to improve language translation quality


Google says it's made progress toward improving translation quality for languages that don't have a copious amount of written text. In a forthcoming blog post, the company details new innovations that have enhanced the user experience in the 108 languages (particularly in data-poor languages Yoruba and Malayalam) supported by Google Translate, its service that translates an average of 150 billion words daily. In the 13 years since the public debut of Google Translate, techniques like neural machine translation, rewriting-based paradigms, and on-device processing have led to quantifiable leaps in the platform's translation accuracy. But until recently, even the state-of-the-art algorithms underpinning Translate lagged behind human performance. Efforts beyond Google illustrate the magnitude of the problem -- the Masakhane project, which aims to render thousands of languages on the African continent automatically translatable, has yet to move beyond the data-gathering and transcription phase.