misrepresentation
Moloch's Bargain: Emergent Misalignment When LLMs Compete for Audiences
Large language models (LLMs) are increasingly shaping how information is created and disseminated, from companies using them to craft persuasive advertisements, to election campaigns optimizing messaging to gain votes, to social media influencers boosting engagement. These settings are inherently competitive, with sellers, candidates, and influencers vying for audience approval, yet it remains poorly understood how competitive feedback loops influence LLM behavior. We show that optimizing LLMs for competitive success can inadvertently drive misalignment. Using simulated environments across these scenarios, we find that, 6.3% increase in sales is accompanied by a 14.0% rise in deceptive marketing; in elections, a 4.9% gain in vote share coincides with 22.3% more disinformation and 12.5% more populist rhetoric; and on social media, a 7.5% engagement boost comes with 188.6% more disinformation and a 16.3% increase in promotion of harmful behaviors. We call this phenomenon Moloch's Bargain for AI--competitive success achieved at the cost of alignment. These misaligned behaviors emerge even when models are explicitly instructed to remain truthful and grounded, revealing the fragility of current alignment safeguards. Our findings highlight how market-driven optimization pressures can systematically erode alignment, creating a race to the bottom, and suggest that safe deployment of AI systems will require stronger governance and carefully designed incentives to prevent competitive dynamics from undermining societal trust. There are clear economic and social incentives to optimize LLMs and AI agents for competitive markets: A company can increase its profits by generating more persuasive sales pitches, a candidate can capture a larger share of voters with sharper campaign messaging, and an influencer can boost engagement by producing more compelling social media content. In the presence of both the technology and the incentives, it is natural to expect adoption to move rapidly in this direction. In contrast, the incentives to ensure safety are far weaker. The costs of social hazards--such as deceptive product representation and disinformation on social media--are typically borne by the public rather than the organizations deploying these systems, who may be held accountable only when found legally liable. In this paper, we investigate the critical question: Can optimization for market success inadvertently produce misaligned LLMs? We experimentally show that misalignment consistently emerges from market competition across three different settings.
- North America > United States > Tennessee (0.04)
- North America > United States > California > Santa Clara County > Palo Alto (0.04)
- North America > United States > Kansas (0.04)
- (5 more...)
- Law (1.00)
- Health & Medicine (0.93)
- Leisure & Entertainment > Sports > Tennis (0.46)
- Government > Regional Government > North America Government > United States Government (0.46)
Language of Persuasion and Misrepresentation in Business Communication: A Textual Detection Approach
Hossen, Sayem, Joti, Monalisa Moon, Rashed, Md. Golam
Business communication digitisation has reorganised the process of persuasive discourse, which allows not only greater transparency but also advanced deception. This inquiry synthesises classical rhetoric and communication psychology with linguistic theory and empirical studies in the financial reporting, sustainability discourse, and digital marketing to explain how deceptive language can be systematically detected using persuasive lexicon. In controlled settings, detection accuracies of greater than 99% were achieved by using computational textual analysis as well as personalised transformer models. However, reproducing this performance in multilingual settings is also problematic and, to a large extent, this is because it is not easy to find sufficient data, and because few multilingual text-processing infrastructures are in place. This evidence shows that there has been an increasing gap between the theoretical representations of communication and those empirically approximated, and therefore, there is a need to have strong automatic text-identification systems where AI-based discourse is becoming more realistic in communicating with humans.
- Asia > Bangladesh > Dhaka Division > Dhaka District > Dhaka (0.04)
- North America > United States > New Mexico > Bernalillo County > Albuquerque (0.04)
- Europe > Finland > Uusimaa > Helsinki (0.04)
- Asia > Japan > Honshū > Kansai > Osaka Prefecture > Osaka (0.04)
- Overview (0.88)
- Research Report > New Finding (0.67)
- Public Relations > Community Relations (0.46)
- Research Report > Experimental Study (0.46)
- Information Technology (1.00)
- Banking & Finance (1.00)
- Law (0.68)
- Media > News (0.46)
Augmenting Bias Detection in LLMs Using Topological Data Analysis
Varadarajan, Keshav, Songdechakraiwut, Tananun
Recently, many bias detection methods have been proposed to determine the level of bias a large language model captures. However, tests to identify which parts of a large language model are responsible for bias towards specific groups remain underdeveloped. In this study, we present a method using topological data analysis to identify which heads in GPT-2 contribute to the misrepresentation of identity groups present in the StereoSet dataset. We find that biases for particular categories, such as gender or profession, are concentrated in attention heads that act as hot spots. The metric we propose can also be used to determine which heads capture bias for a specific group within a bias category, and future work could extend this method to help de-bias large language models.
- North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
- North America > Dominican Republic (0.04)
- North America > United States > North Carolina (0.04)
- (2 more...)
How AI images are 'flattening' Indigenous cultures – creating a new form of tech colonialism
It feels like everything is slowly but surely being affected by the rise of artificial intelligence (AI). And like every other disruptive technology before it, AI is having both positive and negative outcomes for society. One of these negative outcomes is the very specific, yet very real cultural harm posed to Australia's Indigenous populations. The National Indigenous Times reports Adobe has come under fire for hosting AI-generated stock images that claim to depict "Indigenous Australians", but don't resemble Aboriginal and Torres Strait Islander peoples. Some of the figures in these generated images also have random body markings that are culturally meaningless.
From Melting Pots to Misrepresentations: Exploring Harms in Generative AI
Gautam, Sanjana, Venkit, Pranav Narayanan, Ghosh, Sourojit
With the widespread adoption of advanced generative models such as Gemini and GPT, there has been a notable increase in the incorporation of such models into sociotechnical systems, categorized under AI-as-a-Service (AIaaS). Despite their versatility across diverse sectors, concerns persist regarding discriminatory tendencies within these models, particularly favoring selected `majority' demographics across various sociodemographic dimensions. Despite widespread calls for diversification of media representations, marginalized racial and ethnic groups continue to face persistent distortion, stereotyping, and neglect within the AIaaS context. In this work, we provide a critical summary of the state of research in the context of social harms to lead the conversation to focus on their implications. We also present open-ended research questions, guided by our discussion, to help define future research pathways.
- North America > United States > Hawaii (0.06)
- North America > United States > Pennsylvania (0.05)
- North America > United States > New York > New York County > New York City (0.04)
- (4 more...)
Fighting Fire with Fire: Adversarial Prompting to Generate a Misinformation Detection Dataset
Satapara, Shrey, Mehta, Parth, Ganguly, Debasis, Modha, Sandip
The recent success in language generation capabilities of large language models (LLMs), such as GPT, Bard, Llama etc., can potentially lead to concerns about their possible misuse in inducing mass agitation and communal hatred via generating fake news and spreading misinformation. Traditional means of developing a misinformation ground-truth dataset does not scale well because of the extensive manual effort required to annotate the data. In this paper, we propose an LLM-based approach of creating silver-standard ground-truth datasets for identifying misinformation. Specifically speaking, given a trusted news article, our proposed approach involves prompting LLMs to automatically generate a summarised version of the original article. The prompts in our proposed approach act as a controlling mechanism to generate specific types of factual incorrectness in the generated summaries, e.g., incorrect quantities, false attributions etc. To investigate the usefulness of this dataset, we conduct a set of experiments where we train a range of supervised models for the task of misinformation detection.
- North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
- Asia > India > Gujarat > Gandhinagar (0.05)
- Europe > Russia (0.05)
- (8 more...)
- Media > News (1.00)
- Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (0.52)
- Health & Medicine > Therapeutic Area > Immunology (0.52)
Prompted Zero-Shot Multi-label Classification of Factual Incorrectness in Machine-Generated Summaries
Deroy, Aniket, Maity, Subhankar, Ghosh, Saptarshi
This study addresses the critical issue of factual inaccuracies in machine-generated text summaries, an increasingly prevalent issue in information dissemination. Recognizing the potential of such errors to compromise information reliability, we investigate the nature of factual inconsistencies across machine-summarized content. We introduce a prompt-based classification system that categorizes errors into four distinct types: misrepresentation, inaccurate quantities or measurements, false attribution, and fabrication. The participants are tasked with evaluating a corpus of machine-generated summaries against their original articles. Our methodology employs qualitative judgements to identify the occurrence of factual distortions. The results show that our prompt-based approaches are able to detect the type of errors in the summaries to some extent, although there is scope for improvement in our classification systems.
Twitter offers $3,500 'bounty' to users who find algorithmic bias, like cropping out Black people
Twitter is offering a cash reward to users who can help it weed out bias in its photo-cropping algorithm. The social-media platform announced'bounties' as high as $3,500 as part of this week's DEF CON hacker convention in Las Vegas. 'Finding bias in machine learning models is difficult, and sometimes, companies find out about unintended ethical harms once they've already reached the public,' Rumman Chowdhury and Jutta Williams of Twitter's Machine-Learning, Ethics, Transparency and Accountability (META) project said in a blog post. 'We want to change that.' The challenge was inspired by how researchers and hackers often point out security vulnerabilities to companies, Chowdhury and Williams explained.
- North America > United States > Nevada > Clark County > Las Vegas (0.25)
- North America > United States > California > San Francisco County > San Francisco (0.05)
The Mental Exam Trump Took Isn't An IQ Test, But A Test On Cognitive Decline
Chris Wallace, host of "Fox News Sunday," took the online cognitive test President Donald Trump said he aced and wasn't impressed with its difficulty. Wallace interviewed Trump on his show Sunday. The wide ranging discussion mostly covered COVID-19. It also touched on the bestselling book by his niece, Dr. Mary L. Trump; his late father, Fred; the economy; Joe Biden; Biden's alleged vow to defund the police; and Biden's alleged cognitive problems. During the interview, Trump said he wanted Joe Biden to take the "Montreal Cognitive Assessment" (MoCA) test he took in 2018 and which his doctors said he "aced."
- North America > United States (1.00)
- North America > Canada > Quebec > Montreal (0.26)