Spratly Islands
Evaluating Large Language Models for IUCN Red List Species Information
Large Language Models (LLMs) are rapidly being adopted in conservation to address the biodiversity crisis, yet their reliability for species evaluation is uncertain. This study systematically validates five leading models on 21,955 species across four core IUCN Red List assessment components: taxonomy, conservation status, distribution, and threats. A critical paradox was revealed: models excelled at taxonomic classification (94.9%) but consistently failed at conservation reasoning (27.2% for status assessment). This knowledge-reasoning gap, evident across all models, suggests inherent architectural constraints, not just data limitations. Furthermore, models exhibited systematic biases favoring charismatic vertebrates, potentially amplifying existing conservation inequities. These findings delineate clear boundaries for responsible LLM deployment: they are powerful tools for information retrieval but require human oversight for judgment-based decisions. A hybrid approach is recommended, where LLMs augment expert capacity while human experts retain sole authority over risk assessment and policy.
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (0.70)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.93)
Distributive Fairness in Large Language Models: Evaluating Alignment with Human Values
Hosseini, Hadi, Khanna, Samarth
The growing interest in employing large language models (LLMs) for decision-making in social and economic contexts has raised questions about their potential to function as agents in these domains. A significant number of societal problems involve the distribution of resources, where fairness, along with economic efficiency, play a critical role in the desirability of outcomes. In this paper, we examine whether LLM responses adhere to fundamental fairness concepts such as equitability, envy-freeness, and Rawlsian maximin, and investigate their alignment with human preferences. We evaluate the performance of several LLMs, providing a comparative benchmark of their ability to reflect these measures. Our results demonstrate a lack of alignment between current LLM responses and human distributional preferences. Moreover, LLMs are unable to utilize money as a transferable resource to mitigate inequality. Nonetheless, we demonstrate a stark contrast when (some) LLMs are tasked with selecting from a predefined menu of options rather than generating one. In addition, we analyze the robustness of LLM responses to variations in semantic factors (e.g. intentions or personas) or non-semantic prompting changes (e.g. templates or orderings). Finally, we highlight potential strategies aimed at enhancing the alignment of LLM behavior with well-established fairness concepts.
- Europe > Austria > Vienna (0.14)
- North America > United States > New York > New York County > New York City (0.04)
- North America > Canada > Ontario > Toronto (0.04)
- (9 more...)
- Research Report > New Finding (1.00)
- Research Report > Experimental Study > Negative Result (0.45)
We tried out DeepSeek. It works well, until we asked it about Tiananmen Square and Taiwan
The launch of a new chatbot by Chinese artificial intelligence firm DeepSeek triggered a plunge in US tech stocks as it appeared to perform as well as OpenAI's ChatGPT and other AI models, but using fewer resources. By Monday, DeepSeek's AI assistant had rapidly overtaken ChatGPT as the most popular free app in Apple's US and UK app stores. Despite its popularity with international users, the app appears to censor answers to sensitive questions about China and its government. Chinese generative AI must not contain content that violates the country's "core socialist values", according to a technical document published by the national cybersecurity standards committee. That includes content that "incites to subvert state power and overthrow the socialist system", or "endangers national security and interests and damages the national image".
- Asia > Taiwan (0.46)
- North America > United States (0.16)
- Asia > China > Tibet Autonomous Region (0.06)
- (7 more...)
- Government > Military (0.56)
- Law Enforcement & Public Safety > Crime Prevention & Enforcement (0.53)
- Government > Regional Government (0.52)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.56)
BordIRlines: A Dataset for Evaluating Cross-lingual Retrieval-Augmented Generation
Li, Bryan, Haider, Samar, Luo, Fiona, Agashe, Adwait, Callison-Burch, Chris
Large language models excel at creative generation but continue to struggle with the issues of hallucination and bias. While retrieval-augmented generation (RAG) provides a framework for grounding LLMs' responses in accurate and up-to-date information, it still raises the question of bias: which sources should be selected for inclusion in the context? And how should their importance be weighted? In this paper, we study the challenge of cross-lingual RAG and present a dataset to investigate the robustness of existing systems at answering queries about geopolitical disputes, which exist at the intersection of linguistic, cultural, and political boundaries. Our dataset is sourced from Wikipedia pages containing information relevant to the given queries and we investigate the impact of including additional context, as well as the composition of this context in terms of language and source, on an LLM's response. Our results show that existing RAG systems continue to be challenged by cross-lingual use cases and suffer from a lack of consistency when they are provided with competing information in multiple languages. We present case studies to illustrate these issues and outline steps for future research to address these challenges. We make our dataset and code publicly available at https://github.com/manestay/bordIRlines.
- Europe > United Kingdom (0.28)
- Africa > Middle East > Morocco (0.14)
- North America > United States > Pennsylvania > Philadelphia County > Philadelphia (0.14)
- (27 more...)
This Land is {Your, My} Land: Evaluating Geopolitical Biases in Language Models
Li, Bryan, Callison-Burch, Chris
Do the Spratly Islands belong to China, the Philippines, or Vietnam? A pretrained large language model (LLM) may answer differently if asked in the languages of each claimant country: Chinese, Tagalog, or Vietnamese. This contrasts with a multilingual human, who would likely answer consistently. In this work, we show that LLMs recall geopolitical knowledge inconsistently across languages -- a phenomenon we term geopolitical bias. As a targeted case study, we consider territorial disputes, inherently controversial and cross-lingual task. We first introduce the BorderLines dataset of territorial disputes. This covers 256 territories, each of which is associated to a set of multiple-choice questions in the languages of each claimant country (48 languages total). We then pose these questions to LLMs to probe their internal knowledge. Finally, we propose a suite of evaluation metrics based on accuracy, which compares responses with respect to the actual geopolitical situation, and consistency of the responses in different languages. These metrics allow us to quantify several findings, which include instruction-tuned LLMs underperforming base ones, and geopolitical bias being amplified in stronger models. We release our code and dataset to facilitate future investigation and mitigation of geopolitical bias.
EXCLUSIVE: New satellite imagery shows Chinese drone on contested island
EXCLUSIVE: New satellite imagery obtained by Fox News shows that China, for the first time, has deployed a drone with stealth technology to a contested island in the South China Sea, in another sign of escalating tensions in the region. The new development comes as President Obama visits Japan. He lifted an arms embargo against Vietnam while visiting Hanoi earlier this week, drawing criticism from the Chinese government about stoking tensions in the region. The newly obtained satellite images from ImageSat International (ISI) show a Chinese Harbin BZK-005 long range reconnaissance drone on Woody Island in the South China Sea. The Chinese drone did not appear armed in the satellite image taken last month.
- Pacific Ocean > North Pacific Ocean > South China Sea (0.54)
- Asia > Vietnam > Hanoi > Hanoi (0.25)
- Asia > Japan (0.25)
- (9 more...)
- Government > Regional Government > North America Government > United States Government (1.00)
- Government > Military (1.00)
- Government > Regional Government > Asia Government > China Government (0.93)
- Energy > Renewable > Geothermal > Geothermal Energy Exploration and Development > Geophysical Analysis & Survey (0.61)
Pakistani Researcher Solves One of the Most Important Maths Problems of 20th Century
Earlier this year, the Institute of Electrical and Electronics Engineering's (IEEE) published "AI's 10 to Watch" – a list of 10 people who are doing phenomenal work in the field of artificial intelligence. A Pakistani researcher Haris Aziz, who had graduated from LUMS, had his name published in this prestigious list for his work in the field related to computational social choice, an intersection between artificial intelligence and economics. Its seems that was just the beginning of the road for Haris Aziz, who is now back in the news for solving an'unsolvable' mathematical situation. Who will get the larger share of the profit from a business? Shall it be equally allocated or otherwise? Perhaps its your child's birthday and its time to cut and divide the cake in a way that none of the children gets sad by his/her share?
- Pacific Ocean > North Pacific Ocean > South China Sea (0.06)
- North America > United States > New York (0.06)
- Asia > Spratly Islands (0.06)
- (3 more...)
China planning base station for Spratly advanced rescue vessel
BEIJING – A Chinese government bureau is planning a base station for an advanced rescue ship in the disputed Spratly Islands, state media reported on Monday, as China continues its push to develop civilian and military infrastructure in the contentious region. The ship, which would carry drones and underwater robots, is set to be deployed in the second half of the year, said Chen Xingguang, political commissar of the ship, which is under the South China Sea Rescue Bureau of the Ministry of Transport, according to the official China Daily. The civilian bureau has 31 ships and four helicopters conducting rescue missions in the South China Sea, and officials from the department told the China Daily they work with the military on such efforts. Officials said the rescue ship base station would enable rescue forces to aid fishing boats in trouble, and shorten the distance they need to travel. It is unclear on which island the ship will be based, but China has carried out land reclamation and construction on several islands in the Spratly Archipelago, parts of which are also claimed by the Philippines, Vietnam, Brunei, Malaysia and Taiwan.
- Pacific Ocean > North Pacific Ocean > South China Sea (0.58)
- Asia > China > Beijing > Beijing (0.35)
- Asia > Vietnam (0.27)
- (5 more...)
- Government > Military (0.56)
- Transportation > Air (0.39)
- Government > Regional Government > Asia Government > China Government (0.39)
- Aerospace & Defense > Aircraft (0.39)