Goto

Collaborating Authors

 Bamako


Are drones, AI making it harder to fight armed groups in the Sahel?

Al Jazeera

Are drones, AI making it harder to fight armed groups in the Sahel? The brazen attack on the international airport and nearby military airbase in Niamey, Niger's capital, came overnight between January 28 and 29. Balls of orange fire flew across the sky as the Nigerien army attempted to respond while residents ducked for cover and whispered prayers, as shown in videos on social media. ISIL (ISIS) in Sahel Province, or ISSP - a Niger-based outfit earlier known as the ISIL affiliate in the Greater Sahara or ISGS - has since claimed responsibility and says it killed several soldiers, although the Nigerien army disputes this. Many of its fighters had breached military drone hangars using RPGs and mortars, and managed to damage several aircraft and one civilian aeroplane, according to videos from the group.


Three West African juntas have turned to Russia. Now the US wants to engage them

BBC News

Three West African juntas have turned to Russia. The US has declared a stark policy shift towards three West African countries which are battling Islamist insurgents and whose military governments have broken defence ties with France and turned towards Russia. The state department announced that Nick Checker, head of its Bureau of African Affairs, would visit Mali's capital Bamako to convey the United States' respect for Mali's sovereignty and chart a new course in relations, moving past policy missteps. It adds that the US also looks forward to co-operating with Mali's allies, neighbouring Burkina Faso and Niger, on shared security and economic interests. Absent from the agenda is the longstanding American concern for democracy and human rights.


Dealing with the Hard Facts of Low-Resource African NLP

Diarra, Yacouba, Coulibaly, Nouhoum Souleymane, Kamaté, Panga Azazia, Tall, Madani Amadou, Koné, Emmanuel Élisé, Dembélé, Aymane, Leventhal, Michael

arXiv.org Artificial Intelligence

Creating speech datasets, models, and evaluation frameworks for low-resource languages remains challenging given the lack of a broad base of pertinent experience to draw from. This paper reports on the field collection of 612 hours of spontaneous speech in Bambara, a low-resource West African language; the semi-automated annotation of that dataset with transcriptions; the creation of several monolingual ultra-compact and small models using the dataset; and the automatic and human evaluation of their output. We offer practical suggestions for data collection protocols, annotation, and model design, as well as evidence for the importance of performing human evaluation. In addition to the main dataset, multiple evaluation datasets, models, and code are made publicly available.


TEMPO: Global Temporal Building Density and Height Estimation from Satellite Imagery

Glazer, Tammy, Hacheme, Gilles Q., Zaytar, Akram, Marotti, Luana, Michaels, Amy, Tadesse, Girmaw Abebe, White, Kevin, Dodhia, Rahul, Zolli, Andrew, Becker-Reshef, Inbal, Ferres, Juan M. Lavista, Robinson, Caleb

arXiv.org Artificial Intelligence

We present TEMPO, a global, temporally resolved dataset of building density and height derived from high-resolution satellite imagery using deep learning models. We pair building footprint and height data from existing datasets with quarterly PlanetScope basemap satellite images to train a multi-task deep learning model that predicts building density and building height at a 37.6-meter per pixel resolution. We apply this model to global PlanetScope basemaps from Q1 2018 through Q2 2025 to create global, temporal maps of building density and height. We validate these maps by comparing against existing building footprint datasets. Our estimates achieve an F1 score between 85% and 88% on different hand-labeled subsets, and are temporally stable, with a 0.96 five-year trend-consistency score. TEMPO captures quarterly changes in built settlements at a fraction of the computational cost of comparable approaches, unlocking large-scale monitoring of development patterns and climate impacts essential for global resilience and adaptation efforts.


Cost Analysis of Human-corrected Transcription for Predominately Oral Languages

Diarra, Yacouba, Coulibaly, Nouhoum Souleymane, Leventhal, Michael

arXiv.org Artificial Intelligence

Creating speech datasets for low-resource languages is a critical yet poorly understood challenge, particularly regarding the actual cost in human labor. This paper investigates the time and complexity required to produce high-quality annotated speech data for a subset of low-resource languages, low literacy Predomi-nately Oral Languages, focusing on Bambara, a Manding language of Mali. Through a one-month field study involving ten transcribers with native proficiency, we analyze the correction of ASR-generated transcriptions of 53 hours of Bambara voice data. We report that it takes, on average, 30 hours of human labor to accurately transcribe one hour of speech data under laboratory conditions and 36 hours under field conditions. The study provides a baseline and practical insights for a large class of languages with comparable profiles undertaking the creation of NLP resources.


Adaptive Layer-skipping in Pre-trained LLMs

Luo, Xuan, Wang, Weizhi, Yan, Xifeng

arXiv.org Artificial Intelligence

Various layer-skipping methods have been proposed to accelerate token generation in large language models (LLMs). However, limited attention has been paid to a fundamental question: How do computational demands vary across the generation of different tokens? In this work, we introduce FlexiDepth, a method that dynamically adjusts the number of Transformer layers used in text generation. By incorporating a plug-in router and adapter, FlexiDepth enables adaptive computation in LLMs without modifying their original parameters. Applied to Llama-3-8B, it skips 8 out of 32 layers while maintaining full benchmark performance. Our experiments reveal that computational demands in LLMs significantly vary based on token type. Specifically, generating repetitive tokens or fixed phrases requires fewer layers, whereas producing tokens involving computation or high uncertainty requires more layers. Despite the computational savings, FlexiDepth does not yet achieve wall-clock speedup due to varied skipping patterns and I/O overhead. To inspire future work and advance research on practical speedup, we open-sourced FlexiDepth and a dataset documenting its layer allocation patterns.


Mechanistic Interpretability with SAEs: Probing Religion, Violence, and Geography in Large Language Models

Simbeck, Katharina, Mahran, Mariam

arXiv.org Artificial Intelligence

Despite growing research on bias in large language models (LLMs), most work has focused on gender and race, with little attention to religious identity. This paper explores how religion is internally represented in LLMs and how it intersects with concepts of violence and geography. Using mechanistic interpretability and Sparse Autoencoders (SAEs) via the Neuronpedia API, we analyze latent feature activations across five models. We measure overlap between religion- and violence-related prompts and probe semantic patterns in activation contexts. While all five religions show comparable internal cohesion, Islam is more frequently linked to features associated with violent language. In contrast, geographic associations largely reflect real-world religious demographics, revealing how models embed both factual distributions and cultural stereotypes. These findings highlight the value of structural analysis in auditing not just outputs but also internal representations that shape model behavior.


Machine Mirages: Defining the Undefined

Tembine, Hamidou

arXiv.org Artificial Intelligence

As multimodal machine intelligence systems started achieving average animal-level and average human-level fluency in many measurable tasks in processing images, language, and sound, they began to exhibit a new class of cognitive aberrations: machine mirages. These include delusion, illusion, confabulation, hallucination, misattribution error, semantic drift, semantic compression, exaggeration, causal inference failure, uncanny valley of perception, bluffing-patter-bullshitting, cognitive stereotypy, pragmatic misunderstanding, hypersignification, semantic reheating-warming, simulated authority effect, fallacious abductive leap, contextual drift, referential hallucination, semiotic Frankenstein effect, calibration failure, spurious correlation, bias amplification, concept drift sensitivity, misclassification under uncertainty, adversarial vulnerability, overfitting, prosodic misclassification, accent bias, turn boundary failure, semantic boundary confusion, noise overfitting, latency-induced decision drift, ambiguity collapse and other forms of error that mimic but do not replicate human or animal fallibility. This article presents some of the errors and argues that these failures must be explicitly defined and systematically assessed. Understanding machine mirages is essential not only for improving machine intelligence reliability but also for constructing a multiscale ethical, co-evolving intelligence ecosystem that respects the diverse forms of life, cognition, and expression it will inevitably touch.


The Serendipity of Claude AI: Case of the 13 Low-Resource National Languages of Mali

Dembele, Alou, Coulibaly, Nouhoum Souleymane, Leventhal, Michael

arXiv.org Artificial Intelligence

However, most of the world's languages, often referred to as "low-resource languages", still remain either not supported or insufficiently supported due to the limited availability of data and language resources, and market, economic, and global inequality factors. Mali, a multilingual country with 13 official languages, including Bamanankan (Bambara), Bomu, Bozo, Dɔgɔsɔ (Dogon), Fulfulde (Fula), Hassaniya Arabic, Mamara (Minyanka), Maninka, Soninke, Sɔõɔy (Songhay), Senara, Tàmàsàyt (Tamasheq) and Xaasongaxanno (Kassonke), faces severe challenges in digital inclusion limiting economic development, educational advancement, and preservation of cultural heritage (Bird, 2020; Nekoto et al., 2020). These languages share in common a penury of language resources needed to train AI and NLP systems which could play a role in lessening the digital divide (Hammarström et al., 2018). This penury extends from severe in the case of a language like Bambara which has very limited resources to catastrophic for languages like Bomu and Bozo with an almost complete absence of language resources. The need for innovative methods for low-resource languages has spawned varied strategies, such as transfer learning, zero-shot learning, and pre-trained models in related languages (Ruder, 2021; Adelani et al., 2022).


Controllable Context Sensitivity and the Knob Behind It

Minder, Julian, Du, Kevin, Stoehr, Niklas, Monea, Giovanni, Wendler, Chris, West, Robert, Cotterell, Ryan

arXiv.org Artificial Intelligence

When making predictions, a language model must trade off how much it relies on its context vs. its prior knowledge. Choosing how sensitive the model is to its context is a fundamental functionality, as it enables the model to excel at tasks like retrieval-augmented generation and question-answering. In this paper, we search for a knob which controls this sensitivity, determining whether language models answer from the context or their prior knowledge. To guide this search, we design a task for controllable context sensitivity. In this task, we first feed the model a context (Paris is in England) and a question (Where is Paris?); we then instruct the model to either use its prior or contextual knowledge and evaluate whether it generates the correct answer for both intents (either France or England). When fine-tuned on this task, instruction-tuned versions of Llama-3.1, Mistral-v0.3, and Gemma-2 can solve it with high accuracy (85-95%). Analyzing these high-performing models, we narrow down which layers may be important to context sensitivity using a novel linear time algorithm. Then, in each model, we identify a 1-D subspace in a single layer that encodes whether the model follows context or prior knowledge. Interestingly, while we identify this subspace in a fine-tuned model, we find that the exact same subspace serves as an effective knob in not only that model but also non-fine-tuned instruct and base models of that model family. Finally, we show a strong correlation between a model's performance and how distinctly it separates context-agreeing from context-ignoring answers in this subspace. These results suggest a single subspace facilitates how the model chooses between context and prior knowledge, hinting at a simple fundamental mechanism that controls this behavior.