Harare
US's new scramble for Africa is biomedical imperialism
US's new scramble for Africa is biomedical imperialism Late in February, Zimbabwe pulled out of a proposed $367m United States health funding agreement after objecting to provisions requiring broad American access to sensitive health data. The five-year programme was presented as support for HIV/AIDS, tuberculosis, malaria and epidemic preparedness efforts. However, the terms demanded extensive sharing of national health intelligence, including epidemiological surveillance data and pathogen samples, while offering no binding guarantees that Zimbabwe would receive equitable access to medical technologies developed from them. Harare called the proposal an "unequal exchange", warning that Zimbabwe risked supplying the "raw materials for scientific discovery" while the resulting benefits could remain concentrated in the United States and global pharmaceutical firms. Critics increasingly describe this pattern as biomedical extractivism: a toxic combination of exploitative research practices and colonial thinking that reinforces Western dominance.
Meet Britain's real-life SUPERVILLAIN: Eccentric millionaire lives in a bunker beneath a Cold War radar - and is convinced he's going to find UFOs
Some millionaires might be happy frittering away their hardโearned cash on speed boats, golfing holidays, and perhaps the odd football team or two. But William Sachiti is far from your runโofโtheโmill businessman. Much more Blofeld than Bill Gates, Mr Sachiti has decided to use his millions in a far less conventional way. Naturally, that meant buying a Cold War RAF base and firing up the radar station to hunt for UFOs. From his'supervillain lair' in the nuclear bunker beneath former RAF Neatishead, Norfolk, Mr Sachiti is building what may be the world's most sophisticated UFOโhunting machine.
MapEval: A Map-Based Evaluation of Geo-Spatial Reasoning in Foundation Models
Dihan, Mahir Labib, Hassan, Md Tanvir, Parvez, Md Tanvir, Hasan, Md Hasebul, Alam, Md Almash, Cheema, Muhammad Aamir, Ali, Mohammed Eunus, Parvez, Md Rizwan
Recent advancements in foundation models have enhanced AI systems' capabilities in autonomous tool usage and reasoning. However, their ability in location or map-based reasoning - which improves daily life by optimizing navigation, facilitating resource discovery, and streamlining logistics - has not been systematically studied. To bridge this gap, we introduce MapEval, a benchmark designed to assess diverse and complex map-based user queries with geo-spatial reasoning. MapEval features three task types (textual, API-based, and visual) that require collecting world information via map tools, processing heterogeneous geo-spatial contexts (e.g., named entities, travel distances, user reviews or ratings, images), and compositional reasoning, which all state-of-the-art foundation models find challenging. Comprising 700 unique multiple-choice questions about locations across 180 cities and 54 countries, MapEval evaluates foundation models' ability to handle spatial relationships, map infographics, travel planning, and navigation challenges. Using MapEval, we conducted a comprehensive evaluation of 28 prominent foundation models. While no single model excelled across all tasks, Claude-3.5-Sonnet, GPT-4o, and Gemini-1.5-Pro achieved competitive performance overall. However, substantial performance gaps emerged, particularly in MapEval, where agents with Claude-3.5-Sonnet outperformed GPT-4o and Gemini-1.5-Pro by 16% and 21%, respectively, and the gaps became even more amplified when compared to open-source LLMs. Our detailed analyses provide insights into the strengths and weaknesses of current models, though all models still fall short of human performance by more than 20% on average, struggling with complex map images and rigorous geo-spatial reasoning. This gap highlights MapEval's critical role in advancing general-purpose foundation models with stronger geo-spatial understanding.
Measuring Bias of Web-filtered Text Datasets and Bias Propagation Through Training
Mansour, Youssef, Heckel, Reinhard
We investigate biases in pretraining datasets for large language models (LLMs) through dataset classification experiments. Building on prior work demonstrating the existence of biases in popular computer vision datasets, we analyze popular open-source pretraining datasets for LLMs derived from CommonCrawl including C4, RefinedWeb, DolmaCC, RedPajama-V2, FineWeb, and DCLM-Baseline. Despite those datasets being obtained with similar filtering and deduplication steps, neural networks can classify surprisingly well which dataset a single text sequence belongs to, significantly better than a human can. This indicates that popular pretraining datasets have their own unique biases or fingerprints. Those biases remain even when the text is rewritten with LLMs. Moreover, these biases propagate through training: Random sequences generated by models trained on those datasets can be classified well by a classifier trained on the original datasets.
Probing Language Models on Their Knowledge Source
Tighidet, Zineddine, Mogini, Andrea, Mei, Jiali, Piwowarski, Benjamin, Gallinari, Patrick
Large Language Models (LLMs) often encounter conflicts between their learned, internal (parametric knowledge, PK) and external knowledge provided during inference (contextual knowledge, CK). Understanding how LLMs models prioritize one knowledge source over the other remains a challenge. In this paper, we propose a novel probing framework to explore the mechanisms governing the selection between PK and CK in LLMs. Using controlled prompts designed to contradict the model's PK, we demonstrate that specific model activations are indicative of the knowledge source employed. We evaluate this framework on various LLMs of different sizes and demonstrate that mid-layer activations, particularly those related to relations in the input, are crucial in predicting knowledge source selection, paving the way for more reliable models capable of handling knowledge conflicts effectively.
Deep Learning Meets OBIA: Tasks, Challenges, Strategies, and Perspectives
Ma, Lei, Yan, Ziyun, Li, Mengmeng, Liu, Tao, Tan, Liqin, Wang, Xuan, He, Weiqiang, Wang, Ruikun, He, Guangjun, Lu, Heng, Blaschke, Thomas
Deep learning has gained significant attention in remote sensing, especially in pixel- or patch-level applications. Despite initial attempts to integrate deep learning into object-based image analysis (OBIA), its full potential remains largely unexplored. In this article, as OBIA usage becomes more widespread, we conducted a comprehensive review and expansion of its task subdomains, with or without the integration of deep learning. Furthermore, we have identified and summarized five prevailing strategies to address the challenge of deep learning's limitations in directly processing unstructured object data within OBIA, and this review also recommends some important future research directions. Our goal with these endeavors is to inspire more exploration in this fascinating yet overlooked area and facilitate the integration of deep learning into OBIA processing workflows.
SeaKR: Self-aware Knowledge Retrieval for Adaptive Retrieval Augmented Generation
Yao, Zijun, Qi, Weijian, Pan, Liangming, Cao, Shulin, Hu, Linmei, Liu, Weichuan, Hou, Lei, Li, Juanzi
This paper introduces Self-aware Knowledge Retrieval (SeaKR), a novel adaptive RAG model that extracts self-aware uncertainty of LLMs from their internal states. SeaKR activates retrieval when the LLMs present high self-aware uncertainty for generation. To effectively integrate retrieved knowledge snippets, SeaKR re-ranks them based on LLM's self-aware uncertainty to preserve the snippet that reduces their uncertainty to the utmost. To facilitate solving complex tasks that require multiple retrievals, SeaKR utilizes their self-aware uncertainty to choose among different reasoning strategies. Our experiments on both complex and simple Question Answering datasets show that SeaKR outperforms existing adaptive RAG methods. We release our code at https://github.com/THU-KEG/SeaKR.
Low-resourced Languages and Online Knowledge Repositories: A Need-Finding Study
Nigatu, Hellina Hailu, Canny, John, Chasins, Sarah E.
Online Knowledge Repositories (OKRs) like Wikipedia offer communities a way to share and preserve information about themselves and their ways of living. However, for communities with low-resourced languages -- including most African communities -- the quality and volume of content available are often inadequate. One reason for this lack of adequate content could be that many OKRs embody Western ways of knowledge preservation and sharing, requiring many low-resourced language communities to adapt to new interactions. To understand the challenges faced by low-resourced language contributors on the popular OKR Wikipedia, we conducted (1) a thematic analysis of Wikipedia forum discussions and (2) a contextual inquiry study with 14 novice contributors. We focused on three Ethiopian languages: Afan Oromo, Amharic, and Tigrinya. Our analysis revealed several recurring themes; for example, contributors struggle to find resources to corroborate their articles in low-resourced languages, and language technology support, like translation systems and spellcheck, result in several errors that waste contributors' time. We hope our study will support designers in making online knowledge repositories accessible to low-resourced language speakers.
Toward a Critical Toponymy Framework for Named Entity Recognition: A Case Study of Airbnb in New York City
Brunila, Mikael, LaViolette, Jack, CH-Wang, Sky, Verma, Priyanka, Fรฉrรฉ, Clara, McKenzie, Grant
Critical toponymy examines the dynamics of power, capital, and resistance through place names and the sites to which they refer. Studies here have traditionally focused on the semantic content of toponyms and the top-down institutional processes that produce them. However, they have generally ignored the ways in which toponyms are used by ordinary people in everyday discourse, as well as the other strategies of geospatial description that accompany and contextualize toponymic reference. Here, we develop computational methods to measure how cultural and economic capital shape the ways in which people refer to places, through a novel annotated dataset of 47,440 New York City Airbnb listings from the 2010s. Building on this dataset, we introduce a new named entity recognition (NER) model able to identify important discourse categories integral to the characterization of place. Our findings point toward new directions for critical toponymy and to a range of previously understudied linguistic signals relevant to research on neighborhood status, housing and tourism markets, and gentrification.
The US, not China, should take the lead on AI
Senior fellow at the Gatestone Institute Gordon Chang joined'Cavuto Live' to discuss the U.S.'s relationship with China amid the highly anticipated G20 Summit. Emerging technologies like artificial intelligence (AI) should be used as "tools of opportunity, not as weapons of oppression," President Biden remarked recently. But this exhortation makes his subsequent vow to work directly with "our competitors" to harness the power of AI "for good" all the more curious. Working with our competitors, like China, would only empower the Chinese Communist Party (CCP) to write the rules of the road for AI. And we don't want China in the driver's seat.