Conakry Region
Machine Translation for Nko: Tools, Corpora and Baseline Results
Doumbouya, Moussa Koulako Bala, Diané, Baba Mamadi, Cissé, Solo Farabado, Diané, Djibrila, Sow, Abdoulaye, Doumbouya, Séré Moussa, Bangoura, Daouda, Bayo, Fodé Moriba, Condé, Ibrahima Sory 2., Diané, Kalo Mory, Piech, Chris, Manning, Christopher
Unfortunately, to over 40 million people across West African countries date, there isn't any usable machine translation including Mali, Guinea, Ivory Coast, Gambia, (MT) system for Nko, in part due to the unavailability Burkina Faso, Sierra Leone, Senegal, Liberia, and of large text corpora required by state-of-the-art Guinea-Bissau. Nko, which means'I say' in all neural machine translation (NMT) algorithms. Manding languages, was developed as both the Nko is a representative case study of the broader Manding literary standard language and a writing issues that interfere with the goal of universal machine system by Soulemana Kanté in 1949 for the translation. Thousands of languages still purpose of sustaining the strong oral tradition of don't have available or usable MT systems, mainly Manding languages (Niane, 1974; Conde, 2017; due to the unavailability of high-quality parallel Eberhard et al., 2023).
Flickr Africa: Examining Geo-Diversity in Large-Scale, Human-Centric Visual Data
Naggita, Keziah, LaChance, Julienne, Xiang, Alice
Biases in large-scale image datasets are known to influence the performance of computer vision models as a function of geographic context. To investigate the limitations of standard Internet data collection methods in low- and middle-income countries, we analyze human-centric image geo-diversity on a massive scale using geotagged Flickr images associated with each nation in Africa. We report the quantity and content of available data with comparisons to population-matched nations in Europe as well as the distribution of data according to fine-grained intra-national wealth estimates. Temporal analyses are performed at two-year intervals to expose emerging data trends. Furthermore, we present findings for an ``othering'' phenomenon as evidenced by a substantial number of images from Africa being taken by non-local photographers. The results of our study suggest that further work is required to capture image data representative of African people and their environments and, ultimately, to improve the applicability of computer vision models in a global context.
Modelling spatio-temporal trends of air pollution in Africa
Gahungu, Paterne, Kubwimana, Jean Remy, Muhimpundu, Lionel Jean Marie Benjamin, Ndamuzi, Egide
Atmospheric pollution remains one of the major public health threat worldwide with an estimated 7 millions deaths annually. In Africa, rapid urbanization and poor transport infrastructure are worsening the problem. In this paper, we have analysed spatio-temporal variations of PM2.5 across different geographical regions in Africa. The West African region remains the most affected by the high levels of pollution with a daily average of 40.856 $\mu g/m^3$ in some cities like Lagos, Abuja and Bamako. In East Africa, Uganda is reporting the highest pollution level with a daily average concentration of 56.14 $\mu g/m^3$ and 38.65 $\mu g/m^3$ for Kigali. In countries located in the central region of Africa, the highest daily average concentration of PM2.5 of 90.075 $\mu g/m^3$ was recorded in N'Djamena. We compare three data driven models in predicting future trends of pollution levels. Neural network is outperforming Gaussian processes and ARIMA models.
Using Radio Archives for Low-Resource Speech Recognition: Towards an Intelligent Virtual Assistant for Illiterate Users
Doumbouya, Moussa, Einstein, Lisa, Piech, Chris
For many of the 700 million illiterate people around the world, speech recognition technology could provide a bridge to valuable information and services. Yet, those most in need of this technology are often the most underserved by it. In many countries, illiterate people tend to speak only low-resource languages, for which the datasets necessary for speech technology development are scarce. In this paper, we investigate the effectiveness of unsupervised speech representation learning on noisy radio broadcasting archives, which are abundant even in low-resource languages. We make three core contributions. First, we release two datasets to the research community. The first, West African Radio Corpus, contains 142 hours of audio in more than 10 languages with a labeled validation subset. The second, West African Virtual Assistant Speech Recognition Corpus, consists of 10K labeled audio clips in four languages. Next, we share West African wav2vec, a speech encoder trained on the noisy radio corpus, and compare it with the baseline Facebook speech encoder trained on six times more data of higher quality. We show that West African wav2vec performs similarly to the baseline on a multilingual speech recognition task, and significantly outperforms the baseline on a West African language identification task. Finally, we share the first-ever speech recognition models for Maninka, Pular and Susu, languages spoken by a combined 10 million people in over seven countries, including six where the majority of the adult population is illiterate. Our contributions offer a path forward for ethical AI research to serve the needs of those most disadvantaged by the digital divide.
The Internet of the Orals
Internet services like social media, online discussion forums, and crowdsourcing marketplaces have transformed how people participate in the information ecology and digital economy. These services empower mostly urban, affluent, and literate people, and improve their reach to information and instrumental needs. However, these services currently exclude billions of people worldwide who are too poor to afford Internet-enabled devices, too remote to access the Internet, or too low literate to navigate the mostly text-driven Internet. In India and Pakistan alone, there are nearly 1.1 billion people offline. Although 70% of their populations have access to mobile phones, most people still use basic or feature phones, making it difficult to extend existing Internet services on these devices running custom operating systems.