Antigua and Barbuda
Democratic or Authoritarian? Probing a New Dimension of Political Biases in Large Language Models
Piedrahita, David Guzman, Strauss, Irene, Schölkopf, Bernhard, Mihalcea, Rada, Jin, Zhijing
As Large Language Models (LLMs) become increasingly integrated into everyday life and information ecosystems, concerns about their implicit biases continue to persist. While prior work has primarily examined socio-demographic and left--right political dimensions, little attention has been paid to how LLMs align with broader geopolitical value systems, particularly the democracy--authoritarianism spectrum. In this paper, we propose a novel methodology to assess such alignment, combining (1) the F-scale, a psychometric tool for measuring authoritarian tendencies, (2) FavScore, a newly introduced metric for evaluating model favorability toward world leaders, and (3) role-model probing to assess which figures are cited as general role-models by LLMs. We find that LLMs generally favor democratic values and leaders, but exhibit increased favorability toward authoritarian figures when prompted in Mandarin. Further, models are found to often cite authoritarian figures as role models, even outside explicit political contexts. These results shed light on ways LLMs may reflect and potentially reinforce global political ideologies, highlighting the importance of evaluating bias beyond conventional socio-political axes. Our code is available at: https://github.com/irenestrauss/Democratic-Authoritarian-Bias-LLMs.
- North America > Cuba (0.14)
- North America > Canada > Ontario > Toronto (0.14)
- Asia > Middle East > Syria (0.14)
- (185 more...)
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (1.00)
- Questionnaire & Opinion Survey (1.00)
- Law (0.67)
- Government > Regional Government > Asia Government > Middle East Government (0.46)
Language Specific Knowledge: Do Models Know Better in X than in English?
Agarwal, Ishika, Bozdag, Nimet Beyza, Hakkani-Tür, Dilek
Often, multilingual language models are trained with the objective to map semantically similar content (in different languages) in the same latent space. In this paper, we show a nuance in this training objective, and find that by changing the language of the input query, we can improve the question answering ability of language models. Our contributions are two-fold. First, we introduce the term Language Specific Knowledge (LSK) to denote queries that are best answered in an "expert language" for a given LLM, thereby enhancing its question-answering ability. We introduce the problem of language selection -- for some queries, language models can perform better when queried in languages other than English, sometimes even better in low-resource languages -- and the goal is to select the optimal language for the query. Second, we introduce simple to strong baselines to test this problem. Additionally, as a first-pass solution to this novel problem, we design LSKExtractor to benchmark the language-specific knowledge present in a language model and then exploit it during inference. To test our framework, we employ three datasets that contain knowledge about both cultural and social behavioral norms. Overall, LSKExtractor achieves up to 10% relative improvement across datasets, and is competitive against strong baselines, while being feasible in real-world settings. Broadly, our research contributes to the open-source development (https://github.com/agarwalishika/LSKExtractor/tree/main) of language models that are inclusive and more aligned with the cultural and linguistic contexts in which they are deployed.
- Asia > Laos (0.28)
- Asia > South Korea (0.14)
- Asia > North Korea (0.14)
- (182 more...)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.94)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)
AI Diffusion in Low Resource Language Countries
Misra, Amit, Zamir, Syed Waqas, Hamidouche, Wassim, Becker-Reshef, Inbal, Ferres, Juan Lavista
Artificial intelligence (AI) is diffusing globally at unprecedented speed, but adoption remains uneven. Frontier Large Language Models (LLMs) are known to perform poorly on low-resource languages due to data scarcity. We hypothesize that this performance deficit reduces the utility of AI, thereby slowing adoption in Low-Resource Language Countries (LRLCs). To test this, we use a weighted regression model to isolate the language effect from socioeconomic and demographic factors, finding that LRLCs have a share of AI users that is approximately 20% lower relative to their baseline. These results indicate that linguistic accessibility is a significant, independent barrier to equitable AI diffusion.
- North America > The Bahamas (0.14)
- North America > United States > District of Columbia > Washington (0.05)
- South America > Venezuela (0.04)
- (186 more...)
Person-Centric Annotations of LAION-400M: Auditing Bias and Its Transfer to Models
Girrbach, Leander, Alaniz, Stephan, Smith, Genevieve, Darrell, Trevor, Akata, Zeynep
Vision-language models trained on large-scale multimodal datasets show strong demographic biases, but the role of training data in producing these biases remains unclear. A major barrier has been the lack of demographic annotations in web-scale datasets such as LAION-400M. We address this gap by creating person-centric annotations for the full dataset, including over 276 million bounding boxes, perceived gender and race/ethnicity labels, and automatically generated captions. These annotations are produced through validated automatic labeling pipelines combining object detection, multimodal captioning, and finetuned classifiers. Using them, we uncover demographic imbalances and harmful associations, such as the disproportionate linking of men and individuals perceived as Black or Middle Eastern with crime-related and negative content. We also show that 60-70% of gender bias in CLIP and Stable Diffusion can be linearly explained by direct co-occurrences in the data. Our resources establish the first large-scale empirical link between dataset composition and downstream model bias.
- Europe > United Kingdom (0.14)
- Europe > Czechia (0.14)
- Asia > Timor-Leste (0.14)
- (173 more...)
- Information Technology > Sensing and Signal Processing > Image Processing (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
- Information Technology > Artificial Intelligence > Vision > Face Recognition (0.92)
Evaluating Large Language Models for IUCN Red List Species Information
Large Language Models (LLMs) are rapidly being adopted in conservation to address the biodiversity crisis, yet their reliability for species evaluation is uncertain. This study systematically validates five leading models on 21,955 species across four core IUCN Red List assessment components: taxonomy, conservation status, distribution, and threats. A critical paradox was revealed: models excelled at taxonomic classification (94.9%) but consistently failed at conservation reasoning (27.2% for status assessment). This knowledge-reasoning gap, evident across all models, suggests inherent architectural constraints, not just data limitations. Furthermore, models exhibited systematic biases favoring charismatic vertebrates, potentially amplifying existing conservation inequities. These findings delineate clear boundaries for responsible LLM deployment: they are powerful tools for information retrieval but require human oversight for judgment-based decisions. A hybrid approach is recommended, where LLMs augment expert capacity while human experts retain sole authority over risk assessment and policy.
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (0.70)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.93)
Deep learning four decades of human migration
W e present a novel and detailed dataset on origin-destination annual migration flows and stocks between 230 countries and regions, spanning the period from 1990 to the present. Our flow estimates are further disaggregated by country of birth, providing a comprehensive picture of migration over the last 35 years. The estimates are obtained by training a deep recurrent neural network to learn flow patterns from 18 covariates for all countries, including geographic, economic, cultural, societal, and political information. The recurrent architecture of the neural network means that the entire past can influence current migration patterns, allowing us to learn long-range temporal correlations. By training an ensemble of neural networks and additionally pushing uncertainty on the covariates through the trained network, we obtain confidence bounds for all our estimates, allowing researchers to pinpoint the geographic regions most in need of additional data collection. W e validate our approach on various test sets of unseen data, demonstrating that it significantly outperforms traditional methods estimating five-year flows while delivering a significant increase in temporal resolution. The model is fully open source: all training data, neural network weights, and training code are made public alongside the migration estimates, providing a valuable resource for future studies of human migration.
- Oceania > Australia (0.46)
- Europe > Isle of Man (0.28)
- Asia > Russia (0.28)
- (94 more...)
An Explainable and Interpretable Composite Indicator Based on Decision Rules
Corrente, Salvatore, Greco, Salvatore, Słowiński, Roman, Zappalà, Silvano
Composite indicators are widely used to score or classify units evaluated on multiple criteria. Their construction involves aggregating criteria evaluations, a common practice in Multiple Criteria Decision Aiding (MCDA). In MCDA, various methods have been proposed to address key aspects of multiple criteria evaluations, such as the measurement scales of the criteria, the degree of acceptable compensation between them, and their potential interactions. However, beyond producing a final score or classification, it is essential to ensure the explainability and interpretability of results as well as the procedure's transparency. This paper proposes a method for constructing explainable and interpretable composite indicators using " if..., then... " decision rules. We consider the explainability and interpretability of composite indicators in four scenarios: (i) decision rules explain numerical scores obtained from an aggregation of numerical codes corresponding to ordinal qualifiers; (ii) an obscure numerical composite indicator classifies units into quantiles; (iii) given preference information provided by a Decision Maker in the form of classifications of some reference units, a composite indicator is constructed using decision rules; (iv) the classification of a set of units results from the application of an MCDA method and is explained by decision rules. To induce the rules from scored or classified units, we apply the Dominance-based Rough Set Approach. The resulting decision rules relate the class assignment or unit's score to threshold conditions on values of selected indicators in an intelligible way, clarifying the underlying rationale. Moreover, they serve to recommend composite indicator assessment for new units of interest.
- Asia > Laos (0.14)
- South America > Brazil (0.04)
- Africa > Senegal (0.04)
- (182 more...)
- Health & Medicine (1.00)
- Materials > Metals & Mining (0.93)
Revisiting Noise in Natural Language Processing for Computational Social Science
Computational Social Science (CSS) is an emerging field driven by the unprecedented availability of human-generated content for researchers. This field, however, presents a unique set of challenges due to the nature of the theories and datasets it explores, including highly subjective tasks and complex, unstructured textual corpora. Among these challenges, one of the less well-studied topics is the pervasive presence of noise. This thesis aims to address this gap in the literature by presenting a series of interconnected case studies that examine different manifestations of noise in CSS. These include character-level errors following the OCR processing of historical records, archaic language, inconsistencies in annotations for subjective and ambiguous tasks, and even noise and biases introduced by large language models during content generation. This thesis challenges the conventional notion that noise in CSS is inherently harmful or useless. Rather, it argues that certain forms of noise can encode meaningful information that is invaluable for advancing CSS research, such as the unique communication styles of individuals or the culture-dependent nature of datasets and tasks. Further, this thesis highlights the importance of nuance in dealing with noise and the considerations CSS researchers must address when encountering it, demonstrating that different types of noise require distinct strategies.
- North America > Canada (0.45)
- Europe > France (0.45)
- Europe > Spain (0.45)
- (40 more...)
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (1.00)
- Questionnaire & Opinion Survey (1.00)
- (2 more...)
- Media > News (1.00)
- Law Enforcement & Public Safety > Crime Prevention & Enforcement (1.00)
- Information Technology (1.00)
- (9 more...)
GIMMICK -- Globally Inclusive Multimodal Multitask Cultural Knowledge Benchmarking
Schneider, Florian, Holtermann, Carolin, Biemann, Chris, Lauscher, Anne
Large Vision-Language Models (LVLMs) have recently gained attention due to their distinctive performance and broad applicability. While it has been previously shown that their efficacy in usage scenarios involving non-Western contexts falls short, existing studies are limited in scope, covering just a narrow range of cultures, focusing exclusively on a small number of cultural aspects, or evaluating a limited selection of models on a single task only. Towards globally inclusive LVLM research, we introduce GIMMICK, an extensive multimodal benchmark designed to assess a broad spectrum of cultural knowledge across 144 countries representing six global macro-regions. GIMMICK comprises six tasks built upon three new datasets that span 728 unique cultural events or facets on which we evaluated 20 LVLMs and 11 LLMs, including five proprietary and 26 open-weight models of all sizes. We systematically examine (1) regional cultural biases, (2) the influence of model size, (3) input modalities, and (4) external cues. Our analyses reveal strong biases toward Western cultures across models and tasks and highlight strong correlations between model size and performance, as well as the effectiveness of multimodal input and external geographic cues. We further find that models have more knowledge of tangible than intangible aspects (e.g., food vs. rituals) and that they excel in recognizing broad cultural origins but struggle with a more nuanced understanding.
- South America > Colombia (0.28)
- Africa > Republic of the Congo (0.28)
- Europe > Germany (0.14)
- (169 more...)
- Law (1.00)
- Health & Medicine (1.00)
- Media (0.92)
- (3 more...)
- Information Technology > Artificial Intelligence > Vision (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.93)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)
What is in a name? Mitigating Name Bias in Text Embeddings via Anonymization
Manchanda, Sahil, Shivaswamy, Pannaga
Text-embedding models often exhibit biases arising from the data on which they are trained. In this paper, we examine a hitherto unexplored bias in text-embeddings: bias arising from the presence of $\textit{names}$ such as persons, locations, organizations etc. in the text. Our study shows how the presence of $\textit{name-bias}$ in text-embedding models can potentially lead to erroneous conclusions in assessment of thematic similarity.Text-embeddings can mistakenly indicate similarity between texts based on names in the text, even when their actual semantic content has no similarity or indicate dissimilarity simply because of the names in the text even when the texts match semantically. We first demonstrate the presence of name bias in different text-embedding models and then propose $\textit{text-anonymization}$ during inference which involves removing references to names, while preserving the core theme of the text. The efficacy of the anonymization approach is demonstrated on two downstream NLP tasks, achieving significant performance gains. Our simple and training-optimization-free approach offers a practical and easily implementable solution to mitigate name bias.
- South America > Peru (0.14)
- South America > Guyana (0.14)
- South America > Ecuador (0.14)
- (148 more...)