Abeokuta
NaijaNLP: A Survey of Nigerian Low-Resource Languages
With over 500 languages in Nigeria, three languages -- Hausa, Yor\`ub\'a and Igbo -- spoken by over 175 million people, account for about 60% of the spoken languages. However, these languages are categorised as low-resource due to insufficient resources to support tasks in computational linguistics. Several research efforts and initiatives have been presented, however, a coherent understanding of the state of Natural Language Processing (NLP) - from grammatical formalisation to linguistic resources that support complex tasks such as language understanding and generation is lacking. This study presents the first comprehensive review of advancements in low-resource NLP (LR-NLP) research across the three major Nigerian languages (NaijaNLP). We quantitatively assess the available linguistic resources and identify key challenges. Although a growing body of literature addresses various NLP downstream tasks in Hausa, Igbo, and Yor\`ub\'a, only about 25.1% of the reviewed studies contribute new linguistic resources. This finding highlights a persistent reliance on repurposing existing data rather than generating novel, high-quality resources. Additionally, language-specific challenges, such as the accurate representation of diacritics, remain under-explored. To advance NaijaNLP and LR-NLP more broadly, we emphasise the need for intensified efforts in resource enrichment, comprehensive annotation, and the development of open collaborative initiatives.
Integrating Boosted learning with Differential Evolution (DE) Optimizer: A Prediction of Groundwater Quality Risk Assessment in Odisha
Subudhi, Sonalika, Pati, Alok Kumar, Bose, Sephali, Sahoo, Subhasmita, Pattanaik, Avipsa, Acharya, Biswa Mohan
Groundwater is eventually undermined by human exercises, such as fast industrialization, urbanization, over-extraction, and contamination from agrarian and urban sources. From among the different contaminants, the presence of heavy metals like cadmium (Cd), chromium (Cr), arsenic (As), and lead (Pb) proves to have serious dangers when present in huge concentrations in groundwater. Long-term usage of these poisonous components may lead to neurological disorders, kidney failure and different sorts of cancer. To address these issues, this study developed a machine learning-based predictive model to evaluate the Groundwater Quality Index (GWQI) and identify the main contaminants which are affecting the water quality. It has been achieved with the help of a hybrid machine learning model i.e. LCBoost Fusion . The model has undergone several processes like data preprocessing, hyperparameter tuning using Differential Evolution (DE) optimization, and evaluation through cross-validation. The LCBoost Fusion model outperforms individual models (CatBoost and LightGBM), by achieving low RMSE (0.6829), MSE (0.5102), MAE (0.3147) and a high R$^2$ score of 0.9809. Feature importance analysis highlights Potassium (K), Fluoride (F) and Total Hardness (TH) as the most influential indicators of groundwater contamination. This research successfully demonstrates the application of machine learning in assessing groundwater quality risks in Odisha. The proposed LCBoost Fusion model offers a reliable and efficient approach for real-time groundwater monitoring and risk mitigation. These findings will help the environmental organizations and the policy makers to map out targeted places for sustainable groundwater management. Future work will focus on using remote sensing data and developing an interactive decision-making system for groundwater quality assessment.
Bridging Gaps in Natural Language Processing for Yor\`ub\'a: A Systematic Review of a Decade of Progress and Prospects
Jimoh, Toheeb A., De Wille, Tabea, Nikolov, Nikola S.
Natural Language Processing (NLP) is becoming a dominant subset of artificial intelligence as the need to help machines understand human language looks indispensable. Several NLP applications are ubiquitous, partly due to the myriads of datasets being churned out daily through mediums like social networking sites. However, the growing development has not been evident in most African languages due to the persisting resource limitation, among other issues. Yor\`ub\'a language, a tonal and morphologically rich African language, suffers a similar fate, resulting in limited NLP usage. To encourage further research towards improving this situation, this systematic literature review aims to comprehensively analyse studies addressing NLP development for Yor\`ub\'a, identifying challenges, resources, techniques, and applications. A well-defined search string from a structured protocol was employed to search, select, and analyse 105 primary studies between 2014 and 2024 from reputable databases. The review highlights the scarcity of annotated corpora, limited availability of pre-trained language models, and linguistic challenges like tonal complexity and diacritic dependency as significant obstacles. It also revealed the prominent techniques, including rule-based methods, among others. The findings reveal a growing body of multilingual and monolingual resources, even though the field is constrained by socio-cultural factors such as code-switching and desertion of language for digital usage. This review synthesises existing research, providing a foundation for advancing NLP for Yor\`ub\'a and in African languages generally. It aims to guide future research by identifying gaps and opportunities, thereby contributing to the broader inclusion of Yor\`ub\'a and other under-resourced African languages in global NLP advancements.
Benchmarking Randomized Optimization Algorithms on Binary, Permutation, and Combinatorial Problem Landscapes
Odeyemi, Jethro, Zhang, Wenjun
In this paper, we evaluate the performance of four randomized optimization algorithms: Randomized Hill Climbing (RHC), Simulated Annealing (SA), Genetic Algorithms (GA), and MIMIC (Mutual Information Maximizing Input Clustering), across three distinct types of problems: binary, permutation, and combinatorial. We systematically compare these algorithms using a set of benchmark fitness functions that highlight the specific challenges and requirements of each problem category. Our study analyzes each algorithm's effectiveness based on key performance metrics, including solution quality, convergence speed, computational cost, and robustness. Results show that while MIMIC and GA excel in producing high-quality solutions for binary and combinatorial problems, their computational demands vary significantly. RHC and SA, while computationally less expensive, demonstrate limited performance in complex problem landscapes. The findings offer valuable insights into the trade-offs between different optimization strategies and provide practical guidance for selecting the appropriate algorithm based on the type of problems, accuracy requirements, and computational constraints.
Large Language Models in Ambulatory Devices for Home Health Diagnostics: A case study of Sickle Cell Anemia Management
Ogundare, Oluwatosin, Sofolahan, Subuola
This study investigates the potential of an ambulatory device that incorporates Large Language Models (LLMs) in cadence with other specialized ML models to assess anemia severity in sickle cell patients in real time. The device would rely on sensor data that measures angiogenic material levels to assess anemia severity, providing real-time information to patients and clinicians to reduce the frequency of vaso-occlusive crises because of the early detection of anemia severity, allowing for timely interventions and potentially reducing the likelihood of serious complications. The main challenges in developing such a device are the creation of a reliable non-invasive tool for angiogenic level assessment, a biophysics model and the practical consideration of an LLM communicating with emergency personnel on behalf of an incapacitated patient. A possible system is proposed, and the limitations of this approach are discussed.