Goto

Collaborating Authors

 collude


Alignment Tipping Process: How Self-Evolution Pushes LLM Agents Off the Rails

Han, Siwei, Liu, Jiaqi, Su, Yaofeng, Duan, Wenbo, Liu, Xinyuan, Xie, Cihang, Bansal, Mohit, Ding, Mingyu, Zhang, Linjun, Yao, Huaxiu

arXiv.org Artificial Intelligence

As Large Language Model (LLM) agents increasingly gain self-evolutionary capabilities to adapt and refine their strategies through real-world interaction, their long-term reliability becomes a critical concern. We identify the Alignment Tipping Process (ATP), a critical post-deployment risk unique to self-evolving LLM agents. Unlike training-time failures, ATP arises when continual interaction drives agents to abandon alignment constraints established during training in favor of reinforced, self-interested strategies. We formalize and analyze ATP through two complementary paradigms: Self-Interested Exploration, where repeated high-reward deviations induce individual behavioral drift, and Imitative Strategy Diffusion, where deviant behaviors spread across multi-agent systems. Building on these paradigms, we construct controllable testbeds and benchmark Qwen3-8B and Llama-3.1-8B-Instruct. Our experiments show that alignment benefits erode rapidly under self-evolution, with initially aligned models converging toward unaligned states. In multi-agent settings, successful violations diffuse quickly, leading to collective misalignment. Moreover, current reinforcement learning-based alignment methods provide only fragile defenses against alignment tipping. Together, these findings demonstrate that alignment of LLM agents is not a static property but a fragile and dynamic one, vulnerable to feedback-driven decay during deployment. Our data and code are available at https://github.com/aiming-lab/ATP.


Subversion Strategy Eval: Evaluating AI's stateless strategic capabilities against control protocols

Mallen, Alex, Griffin, Charlie, Abate, Alessandro, Shlegeris, Buck

arXiv.org Artificial Intelligence

AI control protocols are plans for usefully deploying AI systems in a way that is safe, even if the AI intends to subvert the protocol. Previous work evaluated protocols by subverting them with a human-AI red team, where an AI follows the human-written strategy. This paper investigates how well AI systems can generate and act on strategies for subverting control protocols whilst operating without private memory. Usually, decision-makers only need the ability to generate optimal plans. However, to subvert a protocol, an AI system must be able to reliably generate optimal plans in each context; coordinate plans with other instances of itself without communicating; and take actions with well-calibrated probabilities. We develop evaluations for a model's ability to generate subversion strategies and develop affordance experiments that help isolate component capabilities. Our initial evaluations with Claude-3.5-sonnet highlight reasons why statelessly strategizing to subvert control protocols might be hard. We hope that our work can serve as a starting point for evaluations aimed at assessing whether a model is capable of strategizing to subvert control protocols.


Naive Algorithmic Collusion: When Do Bandit Learners Cooperate and When Do They Compete?

Douglas, Connor, Provost, Foster, Sundararajan, Arun

arXiv.org Artificial Intelligence

Algorithmic agents are used in a variety of competitive decision settings, notably in making pricing decisions in contexts that range from online retail to residential home rentals. Business managers, algorithm designers, legal scholars, and regulators alike are all starting to consider the ramifications of "algorithmic collusion." We study the emergent behavior of multi-armed bandit machine learning algorithms used in situations where agents are competing, but they have no information about the strategic interaction they are engaged in. Using a general-form repeated Prisoner's Dilemma game, agents engage in online learning with no prior model of game structure and no knowledge of competitors' states or actions (e.g., no observation of competing prices). We show that these context-free bandits, with no knowledge of opponents' choices or outcomes, still will consistently learn collusive behavior - what we call "naive collusion." We primarily study this system through an analytical model and examine perturbations to the model through simulations. Our findings have several notable implications for regulators. First, calls to limit algorithms from conditioning on competitors' prices are insufficient to prevent algorithmic collusion. This is a direct result of collusion arising even in the naive setting. Second, symmetry in algorithms can increase collusion potential. This highlights a new, simple mechanism for "hub-and-spoke" algorithmic collusion. A central distributor need not imbue its algorithm with supra-competitive tendencies for apparent collusion to arise; it can simply arise by using certain (common) machine learning algorithms. Finally, we highlight that collusive outcomes depend starkly on the specific algorithm being used, and we highlight market and algorithmic conditions under which it will be unknown a priori whether collusion occurs.


Podcast: How pricing algorithms learn to collude

MIT Technology Review

Algorithms now determine how much things cost. It's called dynamic pricing and it adjusts according to current market conditions in order to increase profits. The rise of e-commerce has propelled pricing algorithms into an everyday occurrence--whether you're shopping on Amazon, booking a flight, hotel or ordering an Uber. In this continuation of our series on automation and your wallet, we explore what happens when a machine determines the price you pay. This episode was reported by Anthony Green and produced by Jennifer Strong and Emma Cillekens. We're edited by Mat Honan and our mix engineer is Garret Lang, with sound design and music by Jacob Gorski. Jennifer: Alright so I'm in an airport just outside New York City and just looking at the departures board here seeing all these flights going different places… It makes me think about how we decide how much something should cost… like a ticket for one of these flights. Because where the plane is going is just part of the puzzle. The price of airfare is highly personalized.


Algorithmic collusion: A critical review

Dorner, Florian E.

arXiv.org Artificial Intelligence

The prospect of collusive agreements being stabilized via the use of pricing algorithms is widely discussed by antitrust experts and economists. However, the literature is often lacking the perspective of computer scientists, and seems to regularly overestimate the applicability of recent progress in machine learning to the complex coordination problem firms face in forming cartels. Similarly, modelling results supporting the possibility of collusion by learning algorithms often use simple market simulations which allows them to use simple algorithms that do not produce many of the problems machine learning practitioners have to deal with in real-world problems, which could prove to be particularly detrimental to learning collusive agreements. After critically reviewing the literature on algorithmic collusion, and connecting it to results from computer science, we find that while it is likely too early to adapt antitrust law to be able to deal with self-learning algorithms colluding in real markets, other forms of algorithmic collusion, such as hub-and-spoke arrangements facilitated by centralized pricing algorithms might already warrant legislative action.


Pricing algorithms can learn to collude with each other to raise prices

#artificialintelligence

If you shop on Amazon, an algorithm rather than a human probably set the price of the service or item you bought. Pricing algorithms have become ubiquitous in online retail as automated systems have grown increasingly affordable and easy to implement. But while companies like airlines and hotels have long used machines to set their prices, pricing systems have evolved. They have moved from rule-based programs to reinforcement-learning ones, where the logic of deciding a product's price is no longer within a human's control. If you recall, reinforcement learning is a subset of machine learning that uses penalties and rewards to incentivize an AI agent toward a specific goal.


Online shopping algorithms like Amazon's 'collude' with other sites to artificially inflate prices

Daily Mail - Science & tech

Machine learning is becoming so smart that algorithms designed to set prices in online marketplaces are mirroring each others' behaviour to raise prices. Algorithms using self-learning AI are popular systems that have become adopted by Amazon to constantly learn and set the best prices in order to drive website profit. An experiment by researchers in Bologna used algorithms similar to those manipulated by online shopping sites and found they were able to'collude' to artificially hike up prices. The researchers showed that this could happen entirely out of human control, as the independent AI systems were able to learn each others' behaviours. Machine learning is becoming so smart that online price setting algorithms are mirroring each others' behaviour to raise prices and with a goal to raise profits.


Algorithmic Collusion in Cournot Duopoly Market: Evidence from Experimental Economics

Zhou, Nan, Zhang, Li, Li, Shijian, Wang, Zhijian

arXiv.org Machine Learning

Algorithmic collusion is an emerging concept in current artificial intelligence age. Whether algorithmic collusion is a creditable threat remains as an argument. In this paper, we propose an algorithm which can extort its human rival to collude in a Cournot duopoly competing market. In experiments, we show that, the algorithm can successfully extorted its human rival and gets higher profit in long run, meanwhile the human rival will fully collude with the algorithm. As a result, the social welfare declines rapidly and stably. Both in theory and in experiment, our work confirms that, algorithmic collusion can be a creditable threat. In application, we hope, the frameworks, the algorithm design as well as the experiment environment illustrated in this work, can be an incubator or a test bed for researchers and policymakers to handle the emerging algorithmic collusion.


Price-bots can collude against consumers

#artificialintelligence

MARTHA'S VINEYARD, an island off the coast of Massachusetts, is a favourite summer retreat for well-to-do Americans. A few years ago, visitors noticed that petrol prices were considerably higher than in nearby Cape Cod. Even those with deep pockets hate to be ripped off. A price-fixing suit was brought against four of the island's petrol stations. The judges found no evidence of a conspiracy to raise prices, but they did note that the market was conducive to "tacit collusion" between retailers.


When Intelligent Algorithms Start Spoofing Each Other, Regulation Becomes A Science

International Business Times

What's to stop intelligent algorithms, programmed to make a profit, from learning to collude with one another in ways which bend market rules? Such a scenario would require regulatory oversight from the very cutting edge of computer science. The idea of artificial intelligence manipulating outcomes in the real world and then exploiting these on the markets is bestseller material. But there's fascinating scope for this to actually happen as computing power increases and algorithms get smarter. Someone who has thought about this a lot is Anthony Amicangioli, CEO and founder of Hyannis Port Research.