Goto

Collaborating Authors

 Seychelles


Trump says UK's Starmer making 'a big mistake' with Chagos Islands deal

Al Jazeera

Trump says UK's Starmer making'a big mistake' with Chagos Islands deal Donald Trump has criticised the United Kingdom's plan to hand over the Chagos Islands to Mauritius, a day after the United States Department of State gave its official approval of the deal. The US president said on Wednesday that Prime Minister Keir Starmer was "making a big mistake" in the agreement to return sovereignty of the archipelago to Mauritius, and lease back the island of Diego Garcia, which is home to a UK-US military base. The Indian Ocean archipelago became part of British territory in 1814, with the UK detaching it from Mauritius before it gained independence in the 1960s. It then worked with the US to force the islands' residents to leave, in order to build a military base on Diego Garcia, which it had leased to the US. Mauritius won its legal battle for sovereignty over the islands in 2019, and the International Court of Justice (ICJ) urged the UK to cede control.


FRIEDA: Benchmarking Multi-Step Cartographic Reasoning in Vision-Language Models

Pyo, Jiyoon, Jiao, Yuankun, Jung, Dongwon, Li, Zekun, Jang, Leeje, Kirsanova, Sofia, Kim, Jina, Lin, Yijun, Liu, Qin, Xie, Junyi, Askari, Hadi, Xu, Nan, Chen, Muhao, Chiang, Yao-Yi

arXiv.org Artificial Intelligence

Cartographic reasoning is the skill of interpreting geographic relationships by aligning legends, map scales, compass directions, map texts, and geometries across one or more map images. Although essential as a concrete cognitive capability and for critical tasks such as disaster response and urban planning, it remains largely unevaluated. Building on progress in chart and infographic understanding, recent large vision language model studies on map visual question-answering often treat maps as a special case of charts. In contrast, map VQA demands comprehension of layered symbology (e.g., symbols, geometries, and text labels) as well as spatial relations tied to orientation and distance that often span multiple maps and are not captured by chart-style evaluations. To address this gap, we introduce FRIEDA, a benchmark for testing complex open-ended cartographic reasoning in LVLMs. FRIEDA sources real map images from documents and reports in various domains and geographical areas. Following classifications in Geographic Information System (GIS) literature, FRIEDA targets all three categories of spatial relations: topological (border, equal, intersect, within), metric (distance), and directional (orientation). All questions require multi-step inference, and many require cross-map grounding and reasoning. We evaluate eleven state-of-the-art LVLMs under two settings: (1) the direct setting, where we provide the maps relevant to the question, and (2) the contextual setting, where the model may have to identify the maps relevant to the question before reasoning. Even the strongest models, Gemini-2.5-Pro and GPT-5-Think, achieve only 38.20% and 37.20% accuracy, respectively, far below human performance of 84.87%. These results reveal a persistent gap in multi-step cartographic reasoning, positioning FRIEDA as a rigorous benchmark to drive progress on spatial intelligence in LVLMs.


Democratic or Authoritarian? Probing a New Dimension of Political Biases in Large Language Models

Piedrahita, David Guzman, Strauss, Irene, Schölkopf, Bernhard, Mihalcea, Rada, Jin, Zhijing

arXiv.org Artificial Intelligence

As Large Language Models (LLMs) become increasingly integrated into everyday life and information ecosystems, concerns about their implicit biases continue to persist. While prior work has primarily examined socio-demographic and left--right political dimensions, little attention has been paid to how LLMs align with broader geopolitical value systems, particularly the democracy--authoritarianism spectrum. In this paper, we propose a novel methodology to assess such alignment, combining (1) the F-scale, a psychometric tool for measuring authoritarian tendencies, (2) FavScore, a newly introduced metric for evaluating model favorability toward world leaders, and (3) role-model probing to assess which figures are cited as general role-models by LLMs. We find that LLMs generally favor democratic values and leaders, but exhibit increased favorability toward authoritarian figures when prompted in Mandarin. Further, models are found to often cite authoritarian figures as role models, even outside explicit political contexts. These results shed light on ways LLMs may reflect and potentially reinforce global political ideologies, highlighting the importance of evaluating bias beyond conventional socio-political axes. Our code is available at: https://github.com/irenestrauss/Democratic-Authoritarian-Bias-LLMs.



Unlocking the Potential of Global Human Expertise

Neural Information Processing Systems

For example, in the Pandemic Response Challenge experiment, the context consisted of data about the geographic region for which the predictions were made, e.g., historical data of COVID-19 cases and intervention policies; actions were future schedules of intervention policies for the region; and outcomes were predicted future cases of COVID-19 along with the stringency


Language Specific Knowledge: Do Models Know Better in X than in English?

Agarwal, Ishika, Bozdag, Nimet Beyza, Hakkani-Tür, Dilek

arXiv.org Artificial Intelligence

Often, multilingual language models are trained with the objective to map semantically similar content (in different languages) in the same latent space. In this paper, we show a nuance in this training objective, and find that by changing the language of the input query, we can improve the question answering ability of language models. Our contributions are two-fold. First, we introduce the term Language Specific Knowledge (LSK) to denote queries that are best answered in an "expert language" for a given LLM, thereby enhancing its question-answering ability. We introduce the problem of language selection -- for some queries, language models can perform better when queried in languages other than English, sometimes even better in low-resource languages -- and the goal is to select the optimal language for the query. Second, we introduce simple to strong baselines to test this problem. Additionally, as a first-pass solution to this novel problem, we design LSKExtractor to benchmark the language-specific knowledge present in a language model and then exploit it during inference. To test our framework, we employ three datasets that contain knowledge about both cultural and social behavioral norms. Overall, LSKExtractor achieves up to 10% relative improvement across datasets, and is competitive against strong baselines, while being feasible in real-world settings. Broadly, our research contributes to the open-source development (https://github.com/agarwalishika/LSKExtractor/tree/main) of language models that are inclusive and more aligned with the cultural and linguistic contexts in which they are deployed.


Quantifying Climate Policy Action and Its Links to Development Outcomes: A Cross-National Data-Driven Analysis

Dutta, Aditi

arXiv.org Artificial Intelligence

Addressing climate change effectively requires more than cataloguing the number of policies in place; it calls for tools that can reveal their thematic priorities and their tangible impacts on development outcomes. Existing assessments often rely on qualitative descriptions or composite indices, which can mask crucial differences between key domains such as mitigation, adaptation, disaster risk management, and loss and damage. To bridge this gap, we develop a quantitative indicator of climate policy orientation by applying a multilingual transformer-based language model to official national policy documents, achieving a classification accuracy of 0.90 (F1-score). Linking these indicators with World Bank development data in panel regressions reveals that mitigation policies are associated with higher GDP and GNI; disaster risk management correlates with greater GNI and debt but reduced foreign direct investment; adaptation and loss and damage show limited measurable effects. This integrated NLP-econometric framework enables comparable, theme-specific analysis of climate governance, offering a scalable method to monitor progress, evaluate trade-offs, and align policy emphasis with development goals.


AI Diffusion in Low Resource Language Countries

Misra, Amit, Zamir, Syed Waqas, Hamidouche, Wassim, Becker-Reshef, Inbal, Ferres, Juan Lavista

arXiv.org Artificial Intelligence

Artificial intelligence (AI) is diffusing globally at unprecedented speed, but adoption remains uneven. Frontier Large Language Models (LLMs) are known to perform poorly on low-resource languages due to data scarcity. We hypothesize that this performance deficit reduces the utility of AI, thereby slowing adoption in Low-Resource Language Countries (LRLCs). To test this, we use a weighted regression model to isolate the language effect from socioeconomic and demographic factors, finding that LRLCs have a share of AI users that is approximately 20% lower relative to their baseline. These results indicate that linguistic accessibility is a significant, independent barrier to equitable AI diffusion.


DITTO: A Spoofing Attack Framework on Watermarked LLMs via Knowledge Distillation

Ahn, Hyeseon, Park, Shinwoo, Woo, Suyeon, Han, Yo-Sub

arXiv.org Artificial Intelligence

The promise of LLM watermarking rests on a core assumption that a specific watermark proves authorship by a specific model. We demonstrate that this assumption is dangerously flawed. We introduce the threat of watermark spoofing, a sophisticated attack that allows a malicious model to generate text containing the authentic-looking watermark of a trusted, victim model. This enables the seamless misattribution of harmful content, such as disinformation, to reputable sources. The key to our attack is repurposing watermark radioactivity, the unintended inheritance of data patterns during fine-tuning, from a discoverable trait into an attack vector. By distilling knowledge from a watermarked teacher model, our framework allows an attacker to steal and replicate the watermarking signal of the victim model. This work reveals a critical security gap in text authorship verification and calls for a paradigm shift towards technologies capable of distinguishing authentic watermarks from expertly imitated ones. Our code is available at https://github.com/hsannn/ditto.git.


Open Korean Historical Corpus: A Millennia-Scale Diachronic Collection of Public Domain Texts

Song, Seyoung, Kim, Nawon, Chae, Songeun, Park, Kiwoong, Jin, Jiho, Yoo, Haneul, Cho, Kyunghyun, Oh, Alice

arXiv.org Artificial Intelligence

The history of the Korean language is characterized by a discrepancy between its spoken and written forms and a pivotal shift from Chinese characters to the Hangul alphabet. However, this linguistic evolution has remained largely unexplored in NLP due to a lack of accessible historical corpora. To address this gap, we introduce the Open Korean Historical Corpus, a large-scale, openly licensed dataset spanning 1,300 years and 6 languages, as well as under-represented writing systems like Korean-style Sinitic (Idu) and Hanja-Hangul mixed script. This corpus contains 18 million documents and 5 billion tokens from 19 sources, ranging from the 7th century to 2025. We leverage this resource to quantitatively analyze major linguistic shifts: (1) Idu usage peaked in the 1860s before declining sharply; (2) the transition from Hanja to Hangul was a rapid transformation starting around 1890; and (3) North Korea ' s lexical divergence causes modern tokenizers to produce up to 51 times higher out-of-vocabulary rates. This work provides a foundational resource for quantitative diachronic analysis by capturing the history of the Korean language. Moreover, it can serve as a pre-training corpus for large language models, potentially improving their understanding of Sino-Korean vocabulary in modern Hangul as well as archaic writing systems.