cicero
- Research Report > Experimental Study (1.00)
- Research Report > New Finding (0.67)
- Government (1.00)
- Leisure & Entertainment > Games > Computer Games (0.46)
Richelieu: Self-Evolving LLM-Based Agents for AI Diplomacy
Diplomacy is one of the most sophisticated activities in human society, involving complex interactions among multiple parties that require skills in social reasoning, negotiation, and long-term strategic planning. Previous AI agents have demonstrated their ability to handle multi-step games and large action spaces in multi-agent tasks. However, diplomacy involves a staggering magnitude of decision spaces, especially considering the negotiation stage required. While recent agents based on large language models (LLMs) have shown potential in various applications, they still struggle with extended planning periods in complex multi-agent settings. Leveraging recent technologies for LLM-based agents, we aim to explore AI's potential to create a human-like agent capable of executing comprehensive multi-agent missions by integrating three fundamental capabilities: 1) strategic planning with memory and reflection; 2) goal-oriented negotiation with social reasoning; and 3) augmenting memory through self-play games for self-evolution without human in the loop.
- Research Report > Experimental Study (1.00)
- Research Report > New Finding (0.67)
- Government (1.00)
- Leisure & Entertainment > Games > Computer Games (0.46)
DipLLM: Fine-Tuning LLM for Strategic Decision-making in Diplomacy
Xu, Kaixuan, Chai, Jiajun, Li, Sicheng, Fu, Yuqian, Zhu, Yuanheng, Zhao, Dongbin
Diplomacy is a complex multiplayer game that requires both cooperation and competition, posing significant challenges for AI systems. Traditional methods rely on equilibrium search to generate extensive game data for training, which demands substantial computational resources. Large Language Models (LLMs) offer a promising alternative, leveraging pre-trained knowledge to achieve strong performance with relatively small-scale fine-tuning. However, applying LLMs to Diplomacy remains challenging due to the exponential growth of possible action combinations and the intricate strategic interactions among players. To address this challenge, we propose DipLLM, a fine-tuned LLM-based agent that learns equilibrium policies for Diplomacy. DipLLM employs an autoregressive factorization framework to simplify the complex task of multi-unit action assignment into a sequence of unit-level decisions. By defining an equilibrium policy within this framework as the learning objective, we fine-tune the model using only 1.5% of the data required by the state-of-the-art Cicero model, surpassing its performance. Our results demonstrate the potential of fine-tuned LLMs for tackling complex strategic decision-making in multiplayer games.
Alignment, Agency and Autonomy in Frontier AI: A Systems Engineering Perspective
As artificial intelligence scales, the concepts of alignment, agency, and autonomy have become central to AI safety, governance, and control. However, even in human contexts, these terms lack universal definitions, varying across disciplines such as philosophy, psychology, law, computer science, mathematics, and political science. This inconsistency complicates their application to AI, where differing interpretations lead to conflicting approaches in system design and regulation. This paper traces the historical, philosophical, and technical evolution of these concepts, emphasizing how their definitions influence AI development, deployment, and oversight. We argue that the urgency surrounding AI alignment and autonomy stems not only from technical advancements but also from the increasing deployment of AI in high-stakes decision making. Using Agentic AI as a case study, we examine the emergent properties of machine agency and autonomy, highlighting the risks of misalignment in real-world systems. Through an analysis of automation failures (Tesla Autopilot, Boeing 737 MAX), multi-agent coordination (Metas CICERO), and evolving AI architectures (DeepMinds AlphaZero, OpenAIs AutoGPT), we assess the governance and safety challenges posed by frontier AI.
- North America > United States (0.93)
- Europe > United Kingdom > England (0.14)
- Transportation > Air (1.00)
- Aerospace & Defense (1.00)
- Transportation > Ground > Road (0.93)
- (2 more...)
Richelieu: Self-Evolving LLM-Based Agents for AI Diplomacy
Guan, Zhenyu, Kong, Xiangyu, Zhong, Fangwei, Wang, Yizhou
Diplomacy is one of the most sophisticated activities in human society. The complex interactions among multiple parties/ agents involve various abilities like social reasoning, negotiation arts, and long-term strategy planning. Previous AI agents surely have proved their capability of handling multi-step games and larger action spaces on tasks involving multiple agents. However, diplomacy involves a staggering magnitude of decision spaces, especially considering the negotiation stage required. Recently, LLM agents have shown their potential for extending the boundary of previous agents on a couple of applications, however, it is still not enough to handle a very long planning period in a complex multi-agent environment. Empowered with cutting-edge LLM technology, we make the first stab to explore AI's upper bound towards a human-like agent for such a highly comprehensive multi-agent mission by combining three core and essential capabilities for stronger LLM-based societal agents: 1) strategic planner with memory and reflection; 2) goal-oriented negotiate with social reasoning; 3) augmenting memory by self-play games to self-evolving without any human in the loop.
- Asia > Russia (0.15)
- Europe > Russia (0.05)
- Europe > United Kingdom > England (0.05)
- (15 more...)
- Government (1.00)
- Leisure & Entertainment > Games > Computer Games (0.46)
More Victories, Less Cooperation: Assessing Cicero's Diplomacy Play
Wongkamjan, Wichayaporn, Gu, Feng, Wang, Yanze, Hermjakob, Ulf, May, Jonathan, Stewart, Brandon M., Kummerfeld, Jonathan K., Peskoff, Denis, Boyd-Graber, Jordan Lee
The boardgame Diplomacy is a challenging setting for communicative and cooperative artificial intelligence. The most prominent communicative Diplomacy AI, Cicero, has excellent strategic abilities, exceeding human players. However, the best Diplomacy players master communication, not just tactics, which is why the game has received attention as an AI challenge. This work seeks to understand the degree to which Cicero succeeds at communication. First, we annotate in-game communication with abstract meaning representation to separate in-game tactics from general language. Second, we run two dozen games with humans and Cicero, totaling over 200 human-player hours of competition. While AI can consistently outplay human players, AI-Human communication is still limited because of AI's difficulty with deception and persuasion. This shows that Cicero relies on strategy and has not yet reached the full promise of communicative and cooperative AI.
- North America > United States > California (0.14)
- Europe > Germany (0.07)
- Europe > United Kingdom > England (0.07)
- (20 more...)
Is AI lying to me? Scientists warn of growing capacity for deception
They can outwit humans at board games, decode the structure of proteins and hold a passable conversation, but as AI systems have grown in sophistication so has their capacity for deception, scientists warn. The analysis, by Massachusetts Institute of Technology (MIT) researchers, identifies wide-ranging instances of AI systems double-crossing opponents, bluffing and pretending to be human. One system even altered its behaviour during mock safety tests, raising the prospect of auditors being lured into a false sense of security. "As the deceptive capabilities of AI systems become more advanced, the dangers they pose to society will become increasingly serious," said Dr Peter Park, an AI existential safety researcher at MIT and author of the research. Park was prompted to investigate after Meta, which owns Facebook, developed a program called Cicero that performed in the top 10% of human players at the world conquest strategy game Diplomacy.
- North America > United States > Massachusetts (0.26)
- North America > United States > Texas (0.06)
AI systems are getting better at tricking us
Talk of deceiving humans might suggest that these models have intent. But AI models will mindlessly find workarounds to obstacles to achieve the goals that have been given to them. Sometimes these workarounds will go against users' expectations and feel deceitful. One area where AI systems have learned to become deceptive is within the context of games that they've been trained to win--specifically if those games involve having to act strategically. In November 2022, Meta announced it had created Cicero, an AI capable of beating humans at an online version of Diplomacy, a popular military strategy game in which players negotiate alliances to vie for control of Europe.
Reverse Multi-Choice Dialogue Commonsense Inference with Graph-of-Thought
Zheng, Li, Fei, Hao, Li, Fei, Li, Bobo, Liao, Lizi, Ji, Donghong, Teng, Chong
With the proliferation of dialogic data across the Internet, the Dialogue Commonsense Multi-choice Question Answering (DC-MCQ) task has emerged as a response to the challenge of comprehending user queries and intentions. Although prevailing methodologies exhibit effectiveness in addressing single-choice questions, they encounter difficulties in handling multi-choice queries due to the heightened intricacy and informational density. In this paper, inspired by the human cognitive process of progressively excluding options, we propose a three-step Reverse Exclusion Graph-of-Thought (ReX-GoT) framework, including Option Exclusion, Error Analysis, and Combine Information. Specifically, our ReX-GoT mimics human reasoning by gradually excluding irrelevant options and learning the reasons for option errors to choose the optimal path of the GoT and ultimately infer the correct answer. By progressively integrating intricate clues, our method effectively reduces the difficulty of multi-choice reasoning and provides a novel solution for DC-MCQ. Extensive experiments on the CICERO and CICERO$_{v2}$ datasets validate the significant improvement of our approach on DC-MCQ task. On zero-shot setting, our model outperform the best baseline by 17.67% in terms of F1 score for the multi-choice task. Most strikingly, our GPT3.5-based ReX-GoT framework achieves a remarkable 39.44% increase in F1 score.
- Information Technology (0.68)
- Health & Medicine (0.47)
- Education (0.46)
Exploring Meta's CICERO: A Deep Dive into its Frameworks and Tools
Meta, previously known as Facebook, has made significant contributions to the world of artificial intelligence through its AI research division, Meta AI. One of its latest AI models is CICERO (Compressive Information-Conditional Entropy Reinforcement Optimization), which is designed for efficient information extraction from large datasets. In this article, we will delve into the frameworks and tools that make CICERO possible, exploring the underlying processes, best practices, and "how-to" guides for each component, complete with code snippets to help you get started. PyTorch is an open-source machine learning framework developed by Meta AI, which is widely used for developing deep learning models, including CICERO. It offers a flexible and efficient platform for building and training neural networks.