Meta's quest to translate underserved languages is marking its first victory with the open source release of a language model able to decipher 202 languages. Named after Meta's No Language Left Behind initiative and dubbed NLLB-200, the model is the first able to translate so many languages, according to its makers, all with the goal to improve translation for languages overlooked by similar projects. "The vast majority of improvements made in machine translation in the last decades have been for high-resource languages," Meta researchers wrote in a paper [PDF]. "While machine translation continues to grow, the fruits it bears are unevenly distributed," they said. According to the announcement of NLLB-200, the model can translate 55 African languages "with high-quality results."
"Broadly accessible machine translation systems support around 130 languages; our goal is to bring this number up to 200," the authors write as their mission statement. Meta Properties, owner of Facebook, Instagram and WhatsApp, on Wednesday unveiled its latest effort in machine translation, a 190-page opus describing how it has used deep learning forms of neural nets to double state-of-the-art translation for languages to 202 languages, many of them so-called "low resource" languages such as West Central Oromo, a language of the Oromia state of Ethiopia, Tamasheq, spoken in Algeria and several other parts of Northern Africa, and Waray, the language of the Waray people of the Philippines. The report by a team of researchers at Meta, along with scholars at UC Berkeley and Johns Hopkins, "No Language Left Behind: Scaling Human-Centered Machine Translation," is posted on Facebook's AI research Web site, along with a companion blog post, and both should be required reading for the rich detail on the matter. "Broadly accessible machine translation systems support around 130 languages; our goal is to bring this number up to 200," they write as their mission statement. As Stephanie relates, Meta is open-sourcing its data sets and neural network model code on GitHub, and also offering $200,000 I'm awards to outside uses of the technology.
Language is our lifeline to the world. But because high-quality translation tools don't exist for hundreds of languages, billions of people today can't access digital content or participate fully in conversations and communities online in their preferred or native languages. This is particularly an issue for hundreds of millions of people who speak the many languages of Africa and Asia. To help people connect better today and be part of the metaverse of tomorrow, our AI researchers created No Language Left Behind (NLLB), an effort to develop high-quality machine translation capabilities for most of the world's languages. Today, we're announcing an important breakthrough in NLLB: We've built a single AI model called NLLB-200, which translates 200 different languages with results far more accurate than what previous technology could accomplish.
Google is helping the Wikimedia Foundation achieve its goal of making Wikipedia articles available in a lot more languages. The Foundation has added Google Translate to its content translation tool, which human editors can use to add content to non-English Wikipedia websites. Those editors can take advantage of the new option -- "one of the most advanced machine translation systems available today," the foundation called it -- to generate an initial translation that they can then review and edit for readability in their language. The Foundation says volunteer Wikipedia editors have been asking for Google Translate integration for a long time now. According to VentureBeat, this move is an expansion of an earlier partnership, wherein Google promised to help Wikipedia make its English posts more accessible in Indonesia.
Artificial intelligence (AI) must be inclusive to reach its potential. AI applications that solve problems for a small segment of the population will fail to achieve widespread adoption. So, it's important that AI applications be designed and prepared with data that reflects as many segments of the global population as possible. Many moving parts need to be managed well to do that, and one of them is language. The more languages an AI application can handle, the more inclusive it is.