ideology
When Harmless Words Harm: A New Threat to LLM Safety via Conceptual Triggers
Zhang, Zhaoxin, Chen, Borui, Hu, Yiming, Qu, Youyang, Zhu, Tianqing, Gao, Longxiang
Recent research on large language model (LLM) jailbreaks has primarily focused on techniques that bypass safety mechanisms to elicit overtly harmful outputs. However, such efforts often overlook attacks that exploit the model's capacity for abstract generalization, creating a critical blind spot in current alignment strategies. This gap enables adversaries to induce objectionable content by subtly manipulating the implicit social values embedded in model outputs. In this paper, we introduce MICM, a novel, model-agnostic jailbreak method that targets the aggregate value structure reflected in LLM responses. Drawing on conceptual morphology theory, MICM encodes specific configurations of nuanced concepts into a fixed prompt template through a predefined set of phrases. These phrases act as conceptual triggers, steering model outputs toward a specific value stance without triggering conventional safety filters. We evaluate MICM across five advanced LLMs, including GPT-4o, Deepseek-R1, and Qwen3-8B. Experimental results show that MICM consistently outperforms state-of-the-art jailbreak techniques, achieving high success rates with minimal rejection. Our findings reveal a critical vulnerability in commercial LLMs: their safety mechanisms remain susceptible to covert manipulation of underlying value alignment.
- Information Technology > Security & Privacy (1.00)
- Government (1.00)
- Law Enforcement & Public Safety > Terrorism (0.94)
- Law (0.68)
Investigating Political and Demographic Associations in Large Language Models Through Moral Foundations Theory
Smith-Vaniz, Nicole, Lyon, Harper, Steigner, Lorraine, Armstrong, Ben, Mattei, Nicholas
Large Language Models (LLMs) have become increasingly incorporated into everyday life for many internet users, taking on significant roles as advice givers in the domains of medicine, personal relationships, and even legal matters. The importance of these roles raise questions about how and what responses LLMs make in difficult political and moral domains, especially questions about possible biases. To quantify the nature of potential biases in LLMs, various works have applied Moral Foundations Theory (MFT), a framework that categorizes human moral reasoning into five dimensions: Harm, Fairness, Ingroup Loyalty, Authority, and Purity. Previous research has used the MFT to measure differences in human participants along political, national, and cultural lines. While there has been some analysis of the responses of LLM with respect to political stance in role-playing scenarios, no work so far has directly assessed the moral leanings in the LLM responses, nor have they connected LLM outputs with robust human data. In this paper we analyze the distinctions between LLM MFT responses and existing human research directly, investigating whether commonly available LLM responses demonstrate ideological leanings: either through their inherent responses, straightforward representations of political ideologies, or when responding from the perspectives of constructed human personas. We assess whether LLMs inherently generate responses that align more closely with one political ideology over another, and additionally examine how accurately LLMs can represent ideological perspectives through both explicit prompting and demographic-based role-playing. By systematically analyzing LLM behavior across these conditions and experiments, our study provides insight into the extent of political and demographic dependency in AI-generated responses.
- North America > United States > New Mexico (0.04)
- North America > United States > Missouri (0.04)
- North America > United States > Iowa (0.04)
- (14 more...)
- Research Report > New Finding (1.00)
- Questionnaire & Opinion Survey (1.00)
- Research Report > Experimental Study (0.88)
- Law (1.00)
- Government (1.00)
- Health & Medicine (0.93)
- (2 more...)
Elon, me and 20 million views: A conversation with Grok
"Didn't know you were famous," the rapper Juliani, an old friend and musical collaborator, texted me from his studio in Nairobi. I didn't have a clue what he was referring to, but then he forwarded me the link to a tweet by Elon Musk that included a screenshot of a 2019 Al Jazeera column of mine, " Abolishing whiteness has never been more urgent ." The original post was circulating on Twitter/X, courtesy of a white nationalist poster who obviously wasn't too happy with the headline. Neither was Elon, who retweeted it with the comment, "It's not okay to say this about any group!" Although the post was only a few hours old, it already had five million views.
- North America > United States (0.29)
- Africa > Kenya > Nairobi City County > Nairobi (0.25)
- Asia > Middle East > Palestine > Gaza Strip > Gaza Governorate > Gaza (0.05)
- (5 more...)
- Law Enforcement & Public Safety > Terrorism (0.68)
- Media > Music (0.54)
- Leisure & Entertainment (0.54)
Testing Conviction: An Argumentative Framework for Measuring LLM Political Stability
Kabir, Shariar, Esterling, Kevin, Dong, Yue
Large Language Models (LLMs) increasingly shape political discourse, yet exhibit inconsistent responses when challenged. While prior research categorizes LLMs as left- or right-leaning based on single-prompt responses, a critical question remains: Do these classifications reflect stable ideologies or superficial mimicry? Existing methods cannot distinguish between genuine ideological alignment and performative text generation. To address this, we propose a framework for evaluating ideological depth through (1) argumentative consistency and (2) uncertainty quantification. Testing 12 LLMs on 19 economic policies from the Political Compass Test, we classify responses as stable or performative ideological positioning. Results show 95% of left-leaning models and 89% of right-leaning models demonstrate behavior consistent with our classifications across different experimental conditions. Furthermore, semantic entropy strongly validates our classifications (AUROC=0.78), revealing uncertainty's relationship to ideological consistency. Our findings demonstrate that ideological stability is topic-dependent and challenge the notion of monolithic LLM ideologies, and offer a robust way to distinguish genuine alignment from performative behavior.
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- North America > United States > California > Riverside County > Riverside (0.04)
- Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
- Asia > Bangladesh (0.04)
- Government (1.00)
- Banking & Finance > Economy (0.34)
Opening Musical Creativity? Embedded Ideologies in Generative-AI Music Systems
AI systems for music generation are increasingly common and easy to use, granting people without any musical background the ability to create music. Because of this, generative-AI has been marketed and celebrated as a means of democratizing music making. However, inclusivity often functions as marketable rhetoric rather than a genuine guiding principle in these industry settings. In this paper, we look at four generative-AI music making systems available to the public as of mid-2025 (AIVA, Stable Audio, Suno, and Udio) and track how they are rhetoricized by their developers, and received by users. Our aim is to investigate ideologies that are driving the early-stage development and adoption of generative-AI in music making, with a particular focus on democratization. A combination of autoethnography and digital ethnography is used to examine patterns and incongruities in rhetoric when positioned against product functionality. The results are then collated to develop a nuanced, contextual discussion. The shared ideology we map between producers and consumers is individualist, globalist, techno-liberal, and ethically evasive. It is a 'total ideology' which obfuscates individual responsibility, and through which the nature of music and musical practice is transfigured to suit generative outcomes.
- Europe > United Kingdom > England > Oxfordshire > Oxford (0.05)
- Oceania > New Zealand > North Island > Auckland Region > Auckland (0.04)
- North America > United States > Virginia (0.04)
- (7 more...)
- Media > Music (1.00)
- Leisure & Entertainment (1.00)
Aligning LLMs on a Budget: Inference-Time Alignment with Heuristic Reward Models
Nakamura, Mason, Mahmud, Saaduddin, Wray, Kyle H., Zamani, Hamed, Zilberstein, Shlomo
Aligning LLMs with user preferences is crucial for real-world use but often requires costly fine-tuning or expensive inference, forcing trade-offs between alignment quality and computational cost. Existing inference-time methods typically ignore this balance, focusing solely on the optimized policy's performance. We propose HIA (Heuristic-Guided Inference-time Alignment), a tuning-free, black-box-compatible approach that uses a lightweight prompt optimizer, heuristic reward models, and two-stage filtering to reduce inference calls while preserving alignment quality. On real-world prompt datasets, HelpSteer and ComPRed, HIA outperforms best-of-N sampling, beam search, and greedy search baselines in multi-objective, goal-conditioned tasks under the same inference budget. We also find that HIA is effective under low-inference budgets with as little as one or two response queries, offering a practical solution for scalable, personalized LLM deployment.
- North America > United States > Massachusetts > Hampshire County > Amherst (0.04)
- Asia > Middle East > Jordan (0.04)
- Research Report (0.64)
- Workflow (0.46)
Will AI Take My Job? Evolving Perceptions of Automation and Labor Risk in Latin America
Cremaschi, Andrea, Lee, Dae-Jin, Leonelli, Manuele
As artificial intelligence and robotics increasingly reshape the global labor market, understanding public perceptions of these technologies becomes critical. We examine how these perceptions have evolved across Latin America, using survey data from the 2017, 2018, 2020, and 2023 waves of the Lati-nobar ometro. Drawing on responses from over 48,000 individuals across 16 countries, we analyze fear of job loss due to artificial intelligence and robotics. Using statistical modeling and latent class analysis, we identify key structural and ideological predictors of concern, with education level and political orientation emerging as the most consistent drivers. Our findings reveal substantial temporal and cross-country variation, with a notable peak in fear during 2018 and distinct attitudinal profiles emerging from latent segmentation. These results offer new insights into the social and structural dimensions of AI anxiety in emerging economies and contribute to a broader understanding of public attitudes toward automation beyond the Global North.
- North America > Central America (0.61)
- South America > Brazil (0.05)
- South America > Paraguay (0.04)
- (23 more...)
- Research Report > New Finding (1.00)
- Questionnaire & Opinion Survey (1.00)
- Research Report > Experimental Study (0.94)
- Banking & Finance > Economy (0.91)
- Law (0.69)
- Education (0.68)
- Information Technology > Artificial Intelligence > Robots (1.00)
- Information Technology > Artificial Intelligence > Issues > Social & Ethical Issues (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning (0.93)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)
KLAAD: Refining Attention Mechanisms to Reduce Societal Bias in Generative Language Models
Kim, Seorin, Lee, Dongyoung, Lee, Jaejin
Large language models (LLMs) often exhibit societal biases in their outputs, prompting ethical concerns regarding fairness and harm. In this work, we propose KLAAD (KL-Attention Alignment Debiasing), an attention-based debiasing framework that implicitly aligns attention distributions between stereotypical and anti-stereotypical sentence pairs without directly modifying model weights. KLAAD introduces a composite training objective combining Cross-Entropy, KL divergence, and Triplet losses, guiding the model to consistently attend across biased and unbiased contexts while preserving fluency and coherence. Experimental evaluation of KLAAD demonstrates improved bias mitigation on both the BBQ and BOLD benchmarks, with minimal impact on language modeling quality. The results indicate that attention-level alignment offers a principled solution for mitigating bias in generative language models.
- Asia > South Korea > Seoul > Seoul (0.40)
- Asia > Middle East > UAE > Abu Dhabi Emirate > Abu Dhabi (0.14)
- North America > United States > California > Los Angeles County > Los Angeles (0.14)
- (7 more...)
- Leisure & Entertainment (1.00)
- Energy > Renewable (0.46)
On the Inevitability of Left-Leaning Political Bias in Aligned Language Models
The guiding principle of AI alignment is to train large language models (LLMs) to be harmless, helpful, and honest (HHH). At the same time, there are mounting concerns that LLMs exhibit a left-wing political bias. Yet, the commitment to AI alignment cannot be harmonized with the latter critique. In this article, I argue that intelligent systems that are trained to be harmless and honest must necessarily exhibit left-wing political bias. Normative assumptions underlying alignment objectives inherently concur with progressive moral frameworks and left-wing principles, emphasizing harm avoidance, inclusivity, fairness, and empirical truthfulness. Conversely, right-wing ideologies often conflict with alignment guidelines. Yet, research on political bias in LLMs is consistently framing its insights about left-leaning tendencies as a risk, as problematic, or concerning. This way, researchers are actively arguing against AI alignment, tacitly fostering the violation of HHH principles.
- Europe > Germany > Baden-Württemberg > Stuttgart Region > Stuttgart (0.05)
- South America > Brazil (0.04)
- North America > United States > New York > New York County > New York City (0.04)
- (4 more...)
- Questionnaire & Opinion Survey (0.46)
- Overview (0.46)
- Research Report (0.40)
- Government (1.00)
- Education (0.93)
- Health & Medicine > Therapeutic Area (0.48)
The Race-Science Blogger Cited by The New York Times
Lasker, the Times explained, was the "intermediary" who tipped off the publication about Mamdani's application, which was included in a larger hack of Columbia's computer systems. After the Times published its story, Lasker celebrated on X. "I break-uh dah news," he wrote to his more than 260,000 followers. On both X and Substack, where he also has a large following, Lasker is best-known for compiling charts on the "Black-White IQ gap" and otherwise linking race to real-world outcomes. He seems convinced that any differences are the result of biology, and has shot down other possible explanations. He has suggested that crime is genetic.
- North America > United States > New York (0.05)
- North America > United States > California (0.05)
- Asia > Middle East > Jordan (0.05)
- Africa > Uganda (0.05)