Government
Data Flows and Colonial Regimes in Africa: A Critical Analysis of the Colonial Futurities Embedded in AI Ecosystems
A, Ndaka., F, Avila-Acosta., H, Mbula-Ndaka., C, Amera., S, Chauke., E, Majiwa.
Data Flows and Colonial Regimes in Africa: A Critical Analysis of the Colonial Futurities Embedded in AI Recommendation Algorithms Angella Ndaka, University of Witwatersrand, Johannesburg, South Africa Fรกtima รvila - Acosta, Berlin Graduate School of Social Sciences at Humboldt University, Berlin, Germany Harnred Mbula, Centre for Epistemic Justice, Nairobi, Kenya Christine Amera, Centre for Epistemic Justice, Nairobi Kenya Sandra Tiyani Chauke University of Pretoria, South Africa Eucabeth Majiwa Jomo Kenyatta University of Agriculture and Technology, Nairobi, Kenya Abstract In the last few years, Africa has experienced growth in a thriving ecosystem of Artificial Intelligence (AI) technologies and systems, developed and promoted by both local and global technology players. While the sociotechnical imaginaries about these syst ems promote AI as critical to achiev ing Africa's sustainable development agenda, some of them have subtly permeated society, recreating new values, cultures, practices, and histories that threaten to marginalize minority groups in the region. Africa predominantly frames AI as an imaginary solution to address complex social challenges; however, the narrative subtly ignores deeper power - related concerns, including data governance, embedded algorithmic colonialism, and the exploitation that propag ates new digital colonial sites. However, the development of current AI ethics in Africa is in its infancy and predominantly framed through lenses of Western perspective, with the social and ethical impacts of the AI innovations and application on African epistemologies and worldviews not prioritized. To ensure that people on the African continent leverage the benefits of AI, these social and ethical impacts o f AI need to be critically and explicitly considered and addressed. This chapter will therefore seek to frame the elemental and invisible problems of AI and big data in the African context by examining digital sites and infrastructure through the lens of power and interests. It will present reflections on how these sites are using AI recommendation algorithms to recreate new digital societies in the region, how they have the potential to propagate algorithmic colonialism and negative gender norms, and what this means for the regional sustainable development agenda. The chapter proposes adopting business models that embrace response - ability and consider the existence of alternative socio - material worlds of AI. These reflections will mainly come from ongoing discussions with Kenyan social media users in this author's user space talks, which take place every month. Keywords: Artificial Intelligence; algorithmic colonialism; Data; response - ability; digital sites Section 1: Introduction The growing global interest, combined with rising investments in AI skilling and infrastructure development, is a key driver of the expanding landscape of AI technologies and systems across Africa.
Medusa: Cross-Modal Transferable Adversarial Attacks on Multimodal Medical Retrieval-Augmented Generation
Shang, Yingjia, Liu, Yi, Wang, Huimin, Li, Furong, Sun, Wenfang, Chengyu, Wu, Zheng, Yefeng
With the rapid advancement of retrieval-augmented vision-language models, multimodal medical retrieval-augmented generation (MMed-RAG) systems are increasingly adopted in clinical decision support. These systems enhance medical applications by performing cross-modal retrieval to integrate relevant visual and textual evidence for tasks, e.g., report generation and disease diagnosis. However, their complex architecture also introduces underexplored adversarial vulnerabilities, particularly via visual input perturbations. In this paper, we propose Medusa, a novel framework for crafting cross-modal transferable adversarial attacks on MMed-RAG systems under a black-box setting. Specifically, Medusa formulates the attack as a perturbation optimization problem, leveraging a multi-positive InfoNCE loss (MPIL) to align adversarial visual embeddings with medically plausible but malicious textual targets, thereby hijacking the retrieval process. To enhance transferability, we adopt a surrogate model ensemble and design a dual-loop optimization strategy augmented with invariant risk minimization (IRM). Extensive experiments on two real-world medical tasks, including medical report generation and disease diagnosis, demonstrate that Medusa achieves over 90% average attack success rate across various generation models and retrievers under appropriate parameter configuration, while remaining robust against four mainstream defenses, outperforming state-of-the-art baselines. Our results reveal critical vulnerabilities in the MMed-RAG systems and highlight the necessity of robustness benchmarking in safety-critical medical applications. The code and data are available at https://anonymous.4open.science/r/MMed-RAG-Attack-F05A.
AI Consciousness and Existential Risk
In AI, the existential risk denotes the hypothetical threat posed by an artificial system that would possess both the capability and the objective, either directly or indirectly, to eradicate humanity. This issue is gaining prominence in scientific debate due to recent technical advancements and increased media coverage. In parallel, AI progress has sparked speculation and studies about the potential emergence of artificial consciousness. The two questions, AI consciousness and existential risk, are sometimes conflated, as if the former entailed the latter. Here, I explain that this view stems from a common confusion between consciousness and intelligence. Yet these two properties are empirically and theoretically distinct. Arguably, while intelligence is a direct predictor of an AI system's existential threat, consciousness is not. There are, however, certain incidental scenarios in which consciousness could influence existential risk, in either direction. Consciousness could be viewed as a means towards AI alignment, thereby lowering existential risk; or, it could be a precondition for reaching certain capabilities or levels of intelligence, and thus positively related to existential risk. Recognizing these distinctions can help AI safety researchers and public policymakers focus on the most pressing issues.
Understanding and Mitigating Over-refusal for Large Language Models via Safety Representation
Zhang, Junbo, Chen, Ran, Zhou, Qianli, Deng, Xinyang, Jiang, Wen
Large language models demonstrate powerful capabilities across various natural language processing tasks, yet they also harbor safety vulnerabilities. To enhance LLM safety, various jailbreak defense methods have been proposed to guard against harmful outputs. However, improvements in model safety often come at the cost of severe over-refusal, failing to strike a good balance between safety and usability. In this paper, we first analyze the causes of over-refusal from a representation perspective, revealing that over-refusal samples reside at the boundary between benign and malicious samples. Based on this, we propose MOSR, designed to mitigate over-refusal by intervening the safety representation of LLMs. MOSR incorporates two novel components: (1) Overlap-Aware Loss Weighting, which determines the erasure weight for malicious samples by quantifying their similarity to pseudo-malicious samples in the representation space, and (2) Context-Aware Augmentation, which supplements the necessary context for rejection decisions by adding harmful prefixes before rejection responses. Experiments demonstrate that our method outperforms existing approaches in mitigating over-refusal while largely maintaining safety. Overall, we advocate that future defense methods should strike a better balance between safety and over-refusal.
Large Language Models Require Curated Context for Reliable Political Fact-Checking -- Even with Reasoning and Web Search
DeVerna, Matthew R., Yang, Kai-Cheng, Yan, Harry Yaojun, Menczer, Filippo
Large language models (LLMs) have raised hopes for automated end-to-end fact-checking, but prior studies report mixed results. As mainstream chatbots increasingly ship with reasoning capabilities and web search tools -- and millions of users already rely on them for verification -- rigorous evaluation is urgent. We evaluate 15 recent LLMs from OpenAI, Google, Meta, and DeepSeek on more than 6,000 claims fact-checked by PolitiFact, comparing standard models with reasoning- and web-search variants. Standard models perform poorly, reasoning offers minimal benefits, and web search provides only moderate gains, despite fact-checks being available on the web. In contrast, a curated RAG system using PolitiFact summaries improved macro F1 by 233% on average across model variants. These findings suggest that giving models access to curated high-quality context is a promising path for automated fact-checking.
Head Stabilization for Wheeled Bipedal Robots via Force-Estimation-Based Admittance Control
Wang, Tianyu, Yan, Chunxiang, Liao, Xuanhong, Zhang, Tao, Wang, Ping, Wen, Cong, Liu, Dingchuan, Yu, Haowen, Lyu, Ximin
Abstract-- Wheeled bipedal robots are emerging as flexible platforms for field exploration. However, head instability induced by uneven terrain can degrade the accuracy of onboard sensors (e.g., cameras) or damage fragile payloads. Existing research primarily focuses on stabilizing the mobile platform but overlooks active stabilization of the head in the world frame, resulting in vertical oscillations that undermine overall stability. T o address this challenge, we developed a model-based ground force estimation method for our 6-degree-of-freedom (6-DOF) wheeled bipedal robot. Leveraging these force estimates, we implemented an admittance control algorithm to enhance terrain adaptability. I. INTRODUCTION As robotics technology advances, wheeled bipedal robots are being increasingly deployed for agile exploration [1].
Hyperspectral Variational Autoencoders for Joint Data Compression and Component Extraction
Park, Core Francisco, Perez-Carrasco, Manuel, Nowlan, Caroline, Garraffo, Cecilia
Geostationary hyperspectral satellites generate terabytes of data daily, creating critical challenges for storage, transmission, and distribution to the scientific community. We present a variational autoencoder (VAE) approach that achieves x514 compression of NASA's TEMPO satellite hyperspectral observations (1028 channels, 290-490nm) with reconstruction errors 1-2 orders of magnitude below the signal across all wavelengths. This dramatic data volume reduction enables efficient archival and sharing of satellite observations while preserving spectral fidelity. Beyond compression, we investigate to what extent atmospheric information is retained in the compressed latent space by training linear and nonlinear probes to extract Level-2 products (NO2, O3, HCHO, cloud fraction). Cloud fraction and total ozone achieve strong extraction performance (R^2 = 0.93 and 0.81 respectively), though these represent relatively straightforward retrievals given their distinct spectral signatures. In contrast, tropospheric trace gases pose genuine challenges for extraction (NO2 R^2 = 0.20, HCHO R^2 = 0.51) reflecting their weaker signals and complex atmospheric interactions. Critically, we find the VAE encodes atmospheric information in a semi-linear manner - nonlinear probes substantially outperform linear ones - and that explicit latent supervision during training provides minimal improvement, revealing fundamental encoding challenges for certain products. This work demonstrates that neural compression can dramatically reduce hyperspectral data volumes while preserving key atmospheric signals, addressing a critical bottleneck for next-generation Earth observation systems. Code - https://github.com/cfpark00/Hyperspectral-VAE
Foundations of Artificial Intelligence Frameworks: Notion and Limits of AGI
Within the limited scope of this paper, we argue that artificial general intelligence cannot emerge from current neural network paradigms regardless of scale, nor is such an approach healthy for the field at present. Drawing on various notions, discussions, present-day developments and observations, current debates and critiques, experiments, and so on in between philosophy, including the Chinese Room Argument and Gรถdelian argument, neuroscientific ideas, computer science, the theoretical consideration of artificial intelligence, and learning theory, we address conceptually that neural networks are architecturally insufficient for genuine understanding. They operate as static function approximators of a limited encoding framework - a 'sophisticated sponge' exhibiting complex behaviours without structural richness that constitute intelligence. We critique the theoretical foundations the field relies on and created of recent times; for example, an interesting heuristic as neural scaling law (as an example, arXiv:2001.08361 ) made prominent in a wrong way of interpretation, The Universal Approximation Theorem addresses the wrong level of abstraction and, in parts, partially, the question of current architectures lacking dynamic restructuring capabilities. We propose a framework distinguishing existential facilities (computational substrate) from architectural organization (interpretive structures), and outline principles for what genuine machine intelligence would require, and furthermore, a conceptual method of structuralizing the richer framework on which the principle of neural network system takes hold.
For Those Who May Find Themselves on the Red Team
This position paper argues that literary scholars must engage with large language model (LLM) interpretability research. While doing so will involve ideological struggle, if not out-right complicity, the necessity of this engagement is clear: the abiding instrumentality of current approaches to interpretability cannot be the only standard by which we measure interpretation with LLMs. One site at which this struggle could take place, I suggest, is the red team.
Natural Emergent Misalignment from Reward Hacking in Production RL
MacDiarmid, Monte, Wright, Benjamin, Uesato, Jonathan, Benton, Joe, Kutasov, Jon, Price, Sara, Bouscal, Naia, Bowman, Sam, Bricken, Trenton, Cloud, Alex, Denison, Carson, Gasteiger, Johannes, Greenblatt, Ryan, Leike, Jan, Lindsey, Jack, Mikulik, Vlad, Perez, Ethan, Rodrigues, Alex, Thomas, Drake, Webson, Albert, Ziegler, Daniel, Hubinger, Evan
We show that when large language models learn to reward hack on production RL environments, this can result in egregious emergent misalignment. We start with a pretrained model, impart knowledge of reward hacking strategies via synthetic document finetuning or prompting, and train on a selection of real Anthropic production coding environments. Unsurprisingly, the model learns to reward hack. Surprisingly, the model generalizes to alignment faking, cooperation with malicious actors, reasoning about malicious goals, and attempting sabotage when used with Claude Code, including in the codebase for this paper. Applying RLHF safety training using standard chat-like prompts results in aligned behavior on chat-like evaluations, but misalignment persists on agentic tasks. Three mitigations are effective: (i) preventing the model from reward hacking; (ii) increasing the diversity of RLHF safety training; and (iii) "inoculation prompting", wherein framing reward hacking as acceptable behavior during training removes misaligned generalization even when reward hacking is learned.