Africa has over 2000 languages, but these languages are not well-represented in the existing Natural Language Processing ecosystem. One challenge is the lack of useful African language datasets that we can use to solve different social and economic problems. In this article, I have compiled a list of African language datasets from across the web. You can use these datasets in various NLP tasks such as text classification, named entity recognition, machine translation, sentiment analysis, speech recognition, and topic modeling. I've made this collection of datasets public to give you an opportunity to use your skills and help solve different challenges.
Africa has over 2000 languages however, these languages are not well represented in the existing Natural language processing (NLP) ecosystem. One of the challenges is the lack of useful African language datasets that can be used to solve different social and economical problems. In this article, I have compiled a list of African language datasets from across the web. These datasets can be used in numerous NLP tasks such as text classification, named entity recognition, machine translation, sentiment analysis, speech recognition, and topic modeling. This collection of datasets have been made public to give you an opportunity to use your skills and help solving different challenges.
Meta has open-sourced an AI model that can translate across 200 different languages, the company announced Wednesday -- a move that should open up different technologies and digital content to a much wider audience. The model, called No Language Left Behind, can translate across 200 languages, including 55 African languages, with high-quality results. "A handful of languages -- including English, Mandarin, Spanish and Arabic -- dominate the web," the company noted in a blog post. "Native speakers of these very widely spoken languages may take for granted how meaningful it is to read something in your own mother tongue. NLLB will help more people read things in their preferred language, rather than always requiring an intermediary language that often gets the sentiment or content wrong." Meta is of course using NLLB to improve its own products, but by open sourcing the model, technologists can use it to build other tools -- like an AI assistant that works well in languages such as Javanese and Uzbek, or closed captioning in Swahili or Oromo for Bollywood movies.
Facebook has developed an artificial intelligence capable of accurately translating between any pair of 100 languages without relying on first translating to English, as many existing systems do. The AI outperforms such systems by 10 points on a 100-point scale used by academics to automatically evaluate the quality of machine translations. Translations produced by the model were also assessed by humans, who scored it as around 90 per cent accurate. Facebook's system was trained on a data set of 7.5 billion sentence pairs gathered from the web across 100 languages, though not all the languages had an equal number of sentence pairs. "What I really was interested in was cutting out English as a middle man. Globally there are plenty of regions where they speak two languages that aren't English," says Angela Fan of Facebook AI, who led the work.
Amazon has introduced the new Live Translation feature to Alexa, enabling real-time translations between certain languages in both voice and text form. The feature uses the same AI models as Alexa's bilingual understanding to recognize which side of several pairs of languages is being spoken and translating to the other. Right now, the translations are limited to English and with French, Spanish, Hindi, German, Italian, or Brazilian Portuguese. Live Translate available on any Echo device by asking Alexa in English to translate German or French, or any of the other languages. When the voice assistant beeps, the user can speak either language naturally and Alexa will subsequently repeat back what was said in the other language.