Africa
Are tech companies using your private data to train AI models?
Are tech companies using your private data to train AI models? Leading tech companies are in a race to release and improve artificial intelligence (AI) products, leaving users in the United States to puzzle out how much of their personal data could be extracted to train AI tools. Meta (which owns Facebook, Instagram, Threads and WhatsApp), Google and LinkedIn have all rolled out AI app features that have the capacity to draw on users' public profiles or emails. Google and LinkedIn offer users ways to opt out of the AI features, while Meta's AI tool provides no means for its users to say "no, thanks." Anthropic's AI hacking claims divide experts Posts warned that the platforms' AI tool rollouts make most private information available for tech company harvesting .
Ukraine's soldiers react to US peace plan with defiance, anger and resignation
'No one will support it': Ukraine's soldiers react to US peace plan Ukraine's frontline soldiers have reacted to draft US peace proposals with a mixture of defiance, anger and resignation. The BBC spoke to half a dozen who sent us their views via social media and email in response to the original US plan - details of which were leaked last week. Since then, American and Ukrainian negotiators have been working on changes to the proposals - and are set to continue talks about the peace framework. Of the original US plan, Yaroslav, in eastern Ukraine, says it sucks no one will support it while an army medic with the call sign Shtutser dismissed it as an absolutely disgraceful draft of a peace plan, unworthy of our attention. But one soldier with the call sign Snake told us it's time to agree at least on something.
Platonic Representations for Poverty Mapping: Unified Vision-Language Codes or Agent-Induced Novelty?
Murugaboopathy, Satiyabooshan, Jerzak, Connor T., Daoud, Adel
We investigate whether socio-economic indicators like household wealth leave recoverable imprints in satellite imagery (capturing physical features) and Internet-sourced text (reflecting historical/economic narratives). Using Demographic and Health Survey (DHS) data from African neighborhoods, we pair Landsat images with LLM-generated textual descriptions conditioned on location/year and text retrieved by an AI search agent from web sources. We develop a multimodal framework predicting household wealth (International Wealth Index) through five pipelines: (i) vision model on satellite images, (ii) LLM using only location/year, (iii) AI agent searching/synthesizing web text, (iv) joint image-text encoder, (v) ensemble of all signals. Our framework yields three contributions. First, fusing vision and agent/LLM text outperforms vision-only baselines in wealth prediction (e.g., R-squared of 0.77 vs. 0.63 on out-of-sample splits), with LLM-internal knowledge proving more effective than agent-retrieved text, improving robustness to out-of-country and out-of-time generalization. Second, we find partial representational convergence: fused embeddings from vision/language modalities correlate moderately (median cosine similarity of 0.60 after alignment), suggesting a shared latent code of material well-being while retaining complementary details, consistent with the Platonic Representation Hypothesis. Although LLM-only text outperforms agent-retrieved data, challenging our Agent-Induced Novelty Hypothesis, modest gains from combining agent data in some splits weakly support the notion that agent-gathered information introduces unique representational structures not fully captured by static LLM knowledge. Third, we release a large-scale multimodal dataset comprising more than 60,000 DHS clusters linked to satellite images, LLM-generated descriptions, and agent-retrieved texts.
How Well Do LLMs Understand Tunisian Arabic?
Large Language Models (LLMs) are the engines driving today's AI agents. The better these models understand human languages, the more natural and user-friendly the interaction with AI becomes, from everyday devices like computers and smartwatches to any tool that can act intelligently. Yet, the ability of industrial-scale LLMs to comprehend low-resource languages, such as Tunisian Arabic (Tunizi), is often overlooked. This neglect risks excluding millions of Tunisians from fully interacting with AI in their own language, pushing them toward French or English. Such a shift not only threatens the preservation of the Tunisian dialect but may also create challenges for literacy and influence younger generations to favor foreign languages. In this study, we introduce a novel dataset containing parallel Tunizi, standard Tunisian Arabic, and English translations, along with sentiment labels. We benchmark several popular LLMs on three tasks: transliteration, translation, and sentiment analysis. Our results reveal significant differences between models, highlighting both their strengths and limitations in understanding and processing Tunisian dialects. By quantifying these gaps, this work underscores the importance of including low-resource languages in the next generation of AI systems, ensuring technology remains accessible, inclusive, and culturally grounded.
Shona spaCy: A Morphological Analyzer for an Under-Resourced Bantu Language
Despite rapid advances in multilingual natural language processing (NLP), the Bantu language Shona remains under-served in terms of morphological analysis and language-aware tools. This paper presents Shona spaCy, an open-source, rule-based morphological pipeline for Shona built on the spaCy framework. The system combines a curated JSON lexicon with linguistically grounded rules to model noun-class prefixes (Mupanda 1-18), verbal subject concords, tense-aspect markers, ideophones, and clitics, integrating these into token-level annotations for lemma, part-of-speech, and morphological features. The toolkit is available via pip install shona-spacy, with source code at https://github.com/HappymoreMasoka/shona-spacy and a PyPI release at https://pypi.org/project/shona-spacy/0.1.4/. Evaluation on formal and informal Shona corpora yields 90% POS-tagging accuracy and 88% morphological-feature accuracy, while maintaining transparency in its linguistic decisions. By bridging descriptive grammar and computational implementation, Shona spaCy advances NLP accessibility and digital inclusion for Shona speakers and provides a template for morphological analysis tools for other under-resourced Bantu languages.
Multi-Objective Reinforcement Learning for Water Management
Osika, Zuzanna, Rădulescu, Roxana, Salazar, Jazmin Zatarain, Oliehoek, Frans, Murukannaiah, Pradeep K.
Many real-world problems (e.g., resource management, autonomous driving, drug discovery) require optimizing multiple, conflicting objectives. Multi-objective reinforcement learning (MORL) extends classic reinforcement learning to handle multiple objectives simultaneously, yielding a set of policies that capture various trade-offs. However, the MORL field lacks complex, realistic environments and benchmarks. We introduce a water resource (Nile river basin) management case study and model it as a MORL environment. We then benchmark existing MORL algorithms on this task. Our results show that specialized water management methods outperform state-of-the-art MORL approaches, underscoring the scalability challenges MORL algorithms face in real-world scenarios.
ISS-Geo142: A Benchmark for Geolocating Astronaut Photography from the International Space Station
Srivastava, Vedika, Singh, Hemant Kumar, Singh, Jaisal
This paper introduces ISS-Geo142, a curated benchmark for geolocating astronaut photography captured from the International Space Station (ISS). Although the ISS position at capture time is known precisely, the specific Earth locations depicted in these images are typically not directly georeferenced, making automated localization non-trivial. ISS-Geo142 consists of 142 images with associated metadata and manually determined geographic locations, spanning a range of spatial scales and scene types. On top of this benchmark, we implement and evaluate three geolocation pipelines: a neural network based approach (NN-Geo) using VGG16 features and cross-correlation over map-derived Areas of Interest (AOIs), a Scale-Invariant Feature Transform based pipeline (SIFT-Match) using sliding-window feature matching on stitched high-resolution AOIs, and TerraByte, an AI system built around a GPT-4 model with vision capabilities that jointly reasons over image content and ISS coordinates. On ISS-Geo142, NN-Geo achieves a match for 75.52\% of the images under our evaluation protocol, SIFT-Match attains high precision on structurally rich scenes at substantial computational cost, and TerraByte establishes the strongest overall baseline, correctly geolocating approximately 90\% of the images while also producing human-readable geographic descriptions. The methods and experiments were originally developed in 2023; this manuscript is a revised and extended version that situates the work relative to subsequent advances in cross-view geo-localization and remote-sensing vision--language models. Taken together, ISS-Geo142 and these three pipelines provide a concrete, historically grounded benchmark for future work on ISS image geolocation.
Rubio hails 'tremendous progress' at Ukraine peace talks
Rubio hails'tremendous progress' at Ukraine peace talks A tremendous amount of progress has been achieved in talks to finalise a US-proposed peace plan to end the Russia-Ukraine war, Secretary of State Marco Rubio has said. But there's still some work to be done, Rubio said after meeting Ukrainian and European negotiators in Geneva, Switzerland. Ukrainian President Volodymyr Zelensky said there were signals that President [Donald] Trump's team is hearing us. Ukraine and its European allies had expressed concern over the leaked proposals, seen as favouring Russia and welcomed by Vladimir Putin as the basis for settlement. Zelensky had said Ukraine might face a very difficult choice: either losing dignity, or risk losing a key partner.
Medieval Arabic texts help researchers track down explosive star deaths
In 1181, Egyptian, Chinese, and Japanese scholars documented a cosmic explosion. A scan of an Arabic manuscript of the dīwān of Ibn Sanā' al-Mulk dating back to 1181-1182 (left). The Annual perseid meteor seen in the sky on August 14, 2023. The meteors have a radiant bordering on Cassiopeia and Camelopardalis (right). Fischer et al. 2025, Astronomical Notes (left).
How can you tell if your new favourite artist is a real person?
How can you tell if your new favourite artist is a real person? There's a new song doing the rounds, and in the immortal words of Kylie Minogue, you just can't get it out of your head. But what if it was created by a robot, or the artist themself is a product of artificial intelligence (AI)? Do streaming sites have an obligation to label music as AI-generated? And does it even matter, if you like what you hear?