Mbarara District
Data Augmentation With Back translation for Low Resource languages: A case of English and Luganda
Kimera, Richard, Heo, Dongnyeong, Rim, Daniela N., Choi, Heeyoul
In this paper,we explore the application of Back translation (BT) as a semi-supervised technique to enhance Neural Machine Translation(NMT) models for the English-Luganda language pair, specifically addressing the challenges faced by low-resource languages. The purpose of our study is to demonstrate how BT can mitigate the scarcity of bilingual data by generating synthetic data from monolingual corpora. Our methodology involves developing custom NMT models using both publicly available and web-crawled data, and applying Iterative and Incremental Back translation techniques. We strategically select datasets for incremental back translation across multiple small datasets, which is a novel element of our approach. The results of our study show significant improvements, with translation performance for the English-Luganda pair exceeding previous benchmarks by more than 10 BLEU score units across all translation directions. Additionally, our evaluation incorporates comprehensive assessment metrics such as SacreBLEU, ChrF2, and TER, providing a nuanced understanding of translation quality. The conclusion drawn from our research confirms the efficacy of BT when strategically curated datasets are utilized, establishing new performance benchmarks and demonstrating the potential of BT in enhancing NMT models for low-resource languages.
PaliGemma-CXR: A Multi-task Multimodal Model for TB Chest X-ray Interpretation
Musinguzi, Denis, Katumba, Andrew, Murindanyi, Sudi
Tuberculosis (TB) is a infectious global health challenge. Chest X-rays are a standard method for TB screening, yet many countries face a critical shortage of radiologists capable of interpreting these images. Machine learning offers an alternative, as it can automate tasks such as disease diagnosis, and report generation. However, traditional approaches rely on task-specific models, which cannot utilize the interdependence between tasks. Building a multi-task model capable of performing multiple tasks poses additional challenges such as scarcity of multimodal data, dataset imbalance, and negative transfer. To address these challenges, we propose PaliGemma-CXR, a multi-task multimodal model capable of performing TB diagnosis, object detection, segmentation, report generation, and VQA. Starting with a dataset of chest X-ray images annotated with TB diagnosis labels and segmentation masks, we curated a multimodal dataset to support additional tasks. By finetuning PaliGemma on this dataset and sampling data using ratios of the inverse of the size of task datasets, we achieved the following results across all tasks: 90.32% accuracy on TB diagnosis and 98.95% on close-ended VQA, 41.3 BLEU score on report generation, and a mAP of 19.4 and 16.0 on object detection and segmentation, respectively. These results demonstrate that PaliGemma-CXR effectively leverages the interdependence between multiple image interpretation tasks to enhance performance.
Optimizing Vital Sign Monitoring in Resource-Constrained Maternal Care: An RL-Based Restless Bandit Approach
Boehmer, Niclas, Zhao, Yunfan, Xiong, Guojun, Rodriguez-Diaz, Paula, Cibrian, Paola Del Cueto, Ngonzi, Joseph, Boatin, Adeline, Tambe, Milind
Maternal mortality remains a significant global public health challenge. One promising approach to reducing maternal deaths occurring during facility-based childbirth is through early warning systems, which require the consistent monitoring of mothers' vital signs after giving birth. Wireless vital sign monitoring devices offer a labor-efficient solution for continuous monitoring, but their scarcity raises the critical question of how to allocate them most effectively. We devise an allocation algorithm for this problem by modeling it as a variant of the popular Restless Multi-Armed Bandit (RMAB) paradigm. In doing so, we identify and address novel, previously unstudied constraints unique to this domain, which render previous approaches for RMABs unsuitable and significantly increase the complexity of the learning and planning problem. To overcome these challenges, we adopt the popular Proximal Policy Optimization (PPO) algorithm from reinforcement learning to learn an allocation policy by training a policy and value function network. We demonstrate in simulations that our approach outperforms the best heuristic baseline by up to a factor of $4$.
Democratizing AI in Africa: FL for Low-Resource Edge Devices
Fabila, Jorge, Campello, Víctor M., Martín-Isla, Carlos, Obungoloch, Johnes, Leo, Kinyera, Ronald, Amodoi, Lekadir, Karim
Africa faces significant challenges in healthcare delivery due to limited infrastructure and access to advanced medical technologies. This study explores the use of federated learning to overcome these barriers, focusing on perinatal health. We trained a fetal plane classifier using perinatal data from five African countries: Algeria, Ghana, Egypt, Malawi, and Uganda, along with data from Spanish hospitals. To incorporate the lack of computational resources in the analysis, we considered a heterogeneous set of devices, including a Raspberry Pi and several laptops, for model training. We demonstrate comparative performance between a centralized and a federated model, despite the compute limitations, and a significant improvement in model generalizability when compared to models trained only locally. These results show the potential for a future implementation at a large scale of a federated learning platform to bridge the accessibility gap and improve model generalizability with very little requirements.
Seeds of Stereotypes: A Large-Scale Textual Analysis of Race and Gender Associations with Diseases in Online Sources
Hansen, Lasse Hyldig, Andersen, Nikolaj, Gallifant, Jack, McCoy, Liam G., Stone, James K, Izath, Nura, Aguirre-Jerez, Marcela, Bitterman, Danielle S, Gichoya, Judy, Celi, Leo Anthony
Background Advancements in Large Language Models (LLMs) hold transformative potential in healthcare, however, recent work has raised concern about the tendency of these models to produce outputs that display racial or gender biases. Although training data is a likely source of such biases, exploration of disease and demographic associations in text data at scale has been limited. Methods We conducted a large-scale textual analysis using a dataset comprising diverse web sources, including Arxiv, Wikipedia, and Common Crawl. The study analyzed the context in which various diseases are discussed alongside markers of race and gender. Given that LLMs are pre-trained on similar datasets, this approach allowed us to examine the potential biases that LLMs may learn and internalize. We compared these findings with actual demographic disease prevalence as well as GPT-4 outputs in order to evaluate the extent of bias representation. Results Our findings indicate that demographic terms are disproportionately associated with specific disease concepts in online texts. gender terms are prominently associated with disease concepts, while racial terms are much less frequently associated. We find widespread disparities in the associations of specific racial and gender terms with the 18 diseases analyzed. Most prominently, we see an overall significant overrepresentation of Black race mentions in comparison to population proportions. Conclusions Our results highlight the need for critical examination and transparent reporting of biases in LLM pretraining datasets. Our study suggests the need to develop mitigation strategies to counteract the influence of biased training data in LLMs, particularly in sensitive domains such as healthcare.
Ugandan medics deploy AI to stop women dying after childbirth
NAIROBI, Jan 31 (Thomson Reuters Foundation) - Ugandan doctors are giving new mothers artificial intelligence-enabled devices to remotely monitor their health in a first-of-its-kind study aiming to curb thousands of preventable maternal deaths across Africa, medics and developers said. Doctors at Mbarara Hospital in western Uganda will give devices to more than 1,000 women who have undergone caesarean section births to wear on their upper arms at all times. Algorithms detect at-risk cases and alert doctors. Joseph Ngonzi from Mbarara University of Science and Technology, which is conducting the study, said it would help "improve monitoring in a resource-constrained environment". The World Health Organization says almost 300,000 women worldwide die annually from preventable causes related to pregnancy and childbirth - that's more than 800 women every day.
Fraud fighters and bamboo bikes: the African innovators driving change
The Royal Academy of Engineering's Africa prize, now in its sixth year, is the continent's biggest award for engineering innovation. Sixteen African inventors from six countries – including, for the first time, Malawi – have been shortlisted to receive funding, training and mentoring for projects intended to revolutionise sectors ranging from agriculture and banking to women's health. The winner will be awarded £25,000 and the three runners-up will receive £10,000 each. This year's inventions include facial recognition software to prevent financial fraud, a low-cost digital microscope to speed up cervical cancer diagnosis, and two separate innovations made from water hyacinth plants. Four inventors spoke to the Guardian about their innovations and their plans to change Africa for the better.