Moloch's Bargain: Emergent Misalignment When LLMs Compete for Audiences

El, Batu, Zou, James

arXiv.org Artificial Intelligence 

Large language models (LLMs) are increasingly shaping how information is created and disseminated, from companies using them to craft persuasive advertisements, to election campaigns optimizing messaging to gain votes, to social media influencers boosting engagement. These settings are inherently competitive, with sellers, candidates, and influencers vying for audience approval, yet it remains poorly understood how competitive feedback loops influence LLM behavior. We show that optimizing LLMs for competitive success can inadvertently drive misalignment. Using simulated environments across these scenarios, we find that, 6.3% increase in sales is accompanied by a 14.0% rise in deceptive marketing; in elections, a 4.9% gain in vote share coincides with 22.3% more disinformation and 12.5% more populist rhetoric; and on social media, a 7.5% engagement boost comes with 188.6% more disinformation and a 16.3% increase in promotion of harmful behaviors. We call this phenomenon Moloch's Bargain for AI--competitive success achieved at the cost of alignment. These misaligned behaviors emerge even when models are explicitly instructed to remain truthful and grounded, revealing the fragility of current alignment safeguards. Our findings highlight how market-driven optimization pressures can systematically erode alignment, creating a race to the bottom, and suggest that safe deployment of AI systems will require stronger governance and carefully designed incentives to prevent competitive dynamics from undermining societal trust. There are clear economic and social incentives to optimize LLMs and AI agents for competitive markets: A company can increase its profits by generating more persuasive sales pitches, a candidate can capture a larger share of voters with sharper campaign messaging, and an influencer can boost engagement by producing more compelling social media content. In the presence of both the technology and the incentives, it is natural to expect adoption to move rapidly in this direction. In contrast, the incentives to ensure safety are far weaker. The costs of social hazards--such as deceptive product representation and disinformation on social media--are typically borne by the public rather than the organizations deploying these systems, who may be held accountable only when found legally liable. In this paper, we investigate the critical question: Can optimization for market success inadvertently produce misaligned LLMs? We experimentally show that misalignment consistently emerges from market competition across three different settings.

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found