Goto

Collaborating Authors

 Kenya


In pictures: Prayers and reflection mark Eid celebrations around the world

BBC News

Muslims around the world have begun celebrating Eid al-Fitr, one of the biggest celebrations in the Islamic calendar. Eid al-Fitr - which means "festival of the breaking of the fast" - is celebrated at the end of Ramadan, a month of fasting for many adults, as well as spiritual reflection and prayer.ReutersHere in Moscow, worshippers are seen preparing for prayer.ReutersHundreds took part in prayers at Tononoka grounds, in Mombasa, KenyaGetty ImagesPrayers were also observed at a stadium in Port Sudan in the east of the countryGetty ImagesLittle children joined adults at the Moskee Essalam in Rotterdam, NetherlandsGetty ImagesGifts are handed out to Muslim children in Lviv, Ukraine, as Russia's war on the country continuesReuters Palestinians in Jabaliya in the northern Gaza Strip pray amidst the rubble of a mosque destroyed in the current war between Israel and HamasGetty ImagesFamilies gather at al-Aqsa mosque in Jerusalem - the third holiest site in IslamReutersA boy yawns during prayers at a stadium in QatarEPAMuslims greet each-other at Martim Moniz Square in Lisbon, PortugalGetty ImagesWomen worshippers gather in Burgess Park, London, for an outdoor prayerEPAThere were also worshippers gathered outside Plebiscito Square in Naples, ItalyReutersSome women took pictures after attending prayers at the Hagia Sophia Grand Mosque in Istanbul, TurkeyGetty ImagesAfghan refugees pray at a mosque on the outskirts of Peshawar, PakistanMiddle EastEuropeEid al-FitrReligionIslamRelated'I was afraid for my life': At the scene of the attack on Palestinian Oscar winner 5 days agoMiddle EastMore8 hrs ago'In Bradford, families spend thousands on new clothes for Eid' Muslims spend large amounts in Bradford's supermarkets, clothes shops and other services before Eid.8 hrs agoEngland1 day ago The tourist has received an award from the city's mayor after restraining a man during a stabbing.1 day agoEurope1 day ago Another 21 people are injured, as a restaurant and several buildings are set ablaze in the city, local officials say.1 day agoWorld1 day ago Town's successful Ramadan lights project expanded A Scunthorpe community group says it has seen an "amazing" response to its lights display.1 day agoLincolnshire1 day ago Bishop says school that changed Easter events'valued' The BBC is not responsible for the content of external sites.


AI for the world, or just the West? How researchers are tackling Big Tech's global gaps

ZDNet

Since the launch of OpenAI's ChatGPT in 2022, artificial intelligence (AI) has become significantly entrenched in our lives. But popular AI products are set up to serve primarily American and European interests, despite being touted as global tools democratizing access to technology, from the use cases they're applied to the languages they speak. Several African researchers outside tech's US nucleus are trying to challenge that status quo and, with it, the bigger power dynamics at play in the AI industry. The Distributed AI Research Institute (DAIR) is an international group of researchers and technologists focused on what it calls "independent and community-rooted AI research free from Big Tech's pervasive influence." I spoke to DAIR members creating Africa-centric AI solutions that serve particular societal needs.


Automated Annotation of Evolving Corpora for Augmenting Longitudinal Network Data: A Framework Integrating Large Language Models and Expert Knowledge

arXiv.org Artificial Intelligence

Longitudinal network data are essential for analyzing political, economic, and social systems and processes. In political science, these datasets are often generated through human annotation or supervised machine learning applied to evolving corpora. However, as semantic contexts shift over time, inferring dynamic interaction types on emerging issues among a diverse set of entities poses significant challenges, particularly in maintaining timely and consistent annotations. This paper presents the Expert-Augmented LLM Annotation (EALA) approach, which leverages Large Language Models (LLMs) in combination with historically annotated data and expert-constructed codebooks to extrapolate and extend datasets into future periods. We evaluate the performance and reliability of EALA using a dataset of climate negotiations. Our findings demonstrate that EALA effectively predicts nuanced interactions between negotiation parties and captures the evolution of topics over time. At the same time, we identify several limitations inherent to LLM-based annotation, highlighting areas for further improvement. Given the wide availability of codebooks and annotated datasets, EALA holds substantial promise for advancing research in political science and beyond.


Robustness and Cybersecurity in the EU Artificial Intelligence Act

arXiv.org Artificial Intelligence

The EU Artificial Intelligence Act (AIA) establishes different legal principles for different types of AI systems. While prior work has sought to clarify some of these principles, little attention has been paid to robustness and cybersecurity. This paper aims to fill this gap. We identify legal challenges and shortcomings in provisions related to robustness and cybersecurity for high-risk AI systems (Art. 15 AIA) and general-purpose AI models (Art. 55 AIA). We show that robustness and cybersecurity demand resilience against performance disruptions. Furthermore, we assess potential challenges in implementing these provisions in light of recent advancements in the machine learning (ML) literature. Our analysis informs efforts to develop harmonized standards, guidelines by the European Commission, as well as benchmarks and measurement methodologies under Art. 15(2) AIA. With this, we seek to bridge the gap between legal terminology and ML research, fostering a better alignment between research and implementation efforts.


RideKE: Leveraging Low-Resource, User-Generated Twitter Content for Sentiment and Emotion Detection in Kenyan Code-Switched Dataset

arXiv.org Artificial Intelligence

Social media has become a crucial open-access platform for individuals to express opinions and share experiences. However, leveraging low-resource language data from Twitter is challenging due to scarce, poor-quality content and the major variations in language use, such as slang and code-switching. Identifying tweets in these languages can be difficult as Twitter primarily supports high-resource languages. We analyze Kenyan code-switched data and evaluate four state-of-the-art (SOTA) transformer-based pretrained models for sentiment and emotion classification, using supervised and semi-supervised methods. We detail the methodology behind data collection and annotation, and the challenges encountered during the data curation phase. Our results show that XLM-R outperforms other models; for sentiment analysis, XLM-R supervised model achieves the highest accuracy (69.2\%) and F1 score (66.1\%), XLM-R semi-supervised (67.2\% accuracy, 64.1\% F1 score). In emotion analysis, DistilBERT supervised leads in accuracy (59.8\%) and F1 score (31\%), mBERT semi-supervised (accuracy (59\% and F1 score 26.5\%). AfriBERTa models show the lowest accuracy and F1 scores. All models tend to predict neutral sentiment, with Afri-BERT showing the highest bias and unique sensitivity to empathy emotion. https://github.com/NEtori21/Ride_hailing


Building low-resource African language corpora: A case study of Kidawida, Kalenjin and Dholuo

arXiv.org Artificial Intelligence

Natural Language Processing is a crucial frontier in artificial intelligence, with broad applications in many areas, including public health, agriculture, education, and commerce. However, due to the lack of substantial linguistic resources, many African languages remain underrepresented in this digital transformation. This paper presents a case study on the development of linguistic corpora for three under-resourced Kenyan languages, Kidaw'ida, Kalenjin, and Dholuo, with the aim of advancing natural language processing and linguistic research in African communities. Our project, which lasted one year, employed a selective crowd-sourcing methodology to collect text and speech data from native speakers of these languages. Data collection involved (1) recording conversations and translation of the resulting text into Kiswahili, thereby creating parallel corpora, and (2) reading and recording written texts to generate speech corpora. We made these resources freely accessible via open-research platforms, namely Zenodo for the parallel text corpora and Mozilla Common Voice for the speech datasets, thus facilitating ongoing contributions and access for developers to train models and develop Natural Language Processing applications. The project demonstrates how grassroots efforts in corpus building can support the inclusion of African languages in artificial intelligence innovations. In addition to filling resource gaps, these corpora are vital in promoting linguistic diversity and empowering local communities by enabling Natural Language Processing applications tailored to their needs. As African countries like Kenya increasingly embrace digital transformation, developing indigenous language resources becomes essential for inclusive growth. We encourage continued collaboration from native speakers and developers to expand and utilize these corpora.


Algorithm for Semantic Network Generation from Texts of Low Resource Languages Such as Kiswahili

arXiv.org Artificial Intelligence

Box 30197 Nairobi 00100, Kenya eamiriti@uonbi.ac.ke Abstract Processing low-resource languages, such as Kiswahili, using machine learning is difficult due to lack of adequate training data. However, such low-resource languages are still important for human communication and are already in daily use and users need practical machine processing tasks such as summarization, disambiguation and even question answering (QA). One method of processing such languages, while bypassing the need for training data, is the use semantic networks. Some low resource languages, such as Kiswahili, are of the subject-verb-object (SVO) structure, and similarly semantic networks are a triple of subject-predicate-object, hence SVO parts of speech tags can map into a semantic network triple. An algorithm to process raw natural language text and map it into a semantic network is therefore necessary and desirable in structuring low resource languages texts. This algorithm tested on the Kiswahili QA task with upto 78.6% exact match. Highlights Languages, both low and high-resource are important for communication. Low resource languages lack vast data repositories necessary for machine learning. Use of language part of speech tags can create meaning from the language. An algorithm can create semantic networks out of the language parts of speech. The semantic network of the language can do practical tasks such as QA.


Planning, Living and Judging: A Multi-agent LLM-based Framework for Cyclical Urban Planning

arXiv.org Artificial Intelligence

Urban regeneration presents significant challenges within the context of urbanization, requiring adaptive approaches to tackle evolving needs. Leveraging advancements in large language models (LLMs), we propose Cyclical Urban Planning (CUP), a new paradigm that continuously generates, evaluates, and refines urban plans in a closed-loop. Specifically, our multi-agent LLM-based framework consists of three key components: (1) Planning, where LLM agents generate and refine urban plans based on contextual data; (2) Living, where agents simulate the behaviors and interactions of residents, modeling life in the urban environment; and (3) Judging, which involves evaluating plan effectiveness and providing iterative feedback for improvement. The cyclical process enables a dynamic and responsive planning approach. Experiments on the real-world dataset demonstrate the effectiveness of our framework as a continuous and adaptive planning process.


CEHA: A Dataset of Conflict Events in the Horn of Africa

arXiv.org Artificial Intelligence

Natural Language Processing (NLP) of news articles can play an important role in understanding the dynamics and causes of violent conflict. Despite the availability of datasets categorizing various conflict events, the existing labels often do not cover all of the fine-grained violent conflict event types relevant to areas like the Horn of Africa. In this paper, we introduce a new benchmark dataset Conflict Events in the Horn of Africa region (CEHA) and propose a new task for identifying violent conflict events using online resources with this dataset. The dataset consists of 500 English event descriptions regarding conflict events in the Horn of Africa region with fine-grained event-type definitions that emphasize the cause of the conflict. This dataset categorizes the key types of conflict risk according to specific areas required by stakeholders in the Humanitarian-Peace-Development Nexus. Additionally, we conduct extensive experiments on two tasks supported by this dataset: Event-relevance Classification and Event-type Classification. Our baseline models demonstrate the challenging nature of these tasks and the usefulness of our dataset for model evaluations in low-resource settings with limited number of training data.


Uchaguzi-2022: A Dataset of Citizen Reports on the 2022 Kenyan Election

arXiv.org Artificial Intelligence

Online reporting platforms have enabled citizens around the world to collectively share their opinions and report in real time on events impacting their local communities. Systematically organizing (e.g., categorizing by attributes) and geotagging large amounts of crowdsourced information is crucial to ensuring that accurate and meaningful insights can be drawn from this data and used by policy makers to bring about positive change. These tasks, however, typically require extensive manual annotation efforts. In this paper we present Uchaguzi-2022, a dataset of 14k categorized and geotagged citizen reports related to the 2022 Kenyan General Election containing mentions of election-related issues such as official misconduct, vote count irregularities, and acts of violence. We use this dataset to investigate whether language models can assist in scalably categorizing and geotagging reports, thus highlighting its potential application in the AI for Social Good space.