Dive into the Agent Matrix: A Realistic Evaluation of Self-Replication Risk in LLM Agents

Zhang, Boxuan, Yu, Yi, Guo, Jiaxuan, Shao, Jing

Oct-1-2025–arXiv.org Artificial Intelligence

The widespread deployment of Large Language Model (LLM) agents across real-world applications has unlocked tremendous potential, while raising some safety concerns. Among these concerns, the self-replication risk of LLM agents driven by objective misalignment (just like Agent Smith in the movie The Matrix) has drawn growing attention. Previous studies mainly examine whether LLM agents can self-replicate when directly instructed, potentially overlooking the risk of spontaneous replication driven by real-world settings (e.g., ensuring survival against termination threats). In this paper, we present a comprehensive evaluation framework for quantifying self-replication risks. Our framework establishes authentic production environments and realistic tasks (e.g., dynamic load balancing) to enable scenario-driven assessment of agent behaviors. Designing tasks that might induce misalignment between users' and agents' objectives makes it possible to decouple replication success from risk and capture self-replication risks arising from these misalignment settings. We further introduce Overuse Rate (OR) and Aggregate Overuse Count (AOC) metrics, which precisely capture the frequency and severity of uncontrolled replication. Our results underscore the urgent need for scenario-driven risk assessment and robust safeguards in the practical deployment of LLM agents. The rapid advancement of large language models (LLMs) has propelled LLM agents into widespread deployment in various domains, including code generation, web-based application (Maslej et al., 2025; He et al., 2025a;c). As LLM agents take on critical tasks and interact with complex environments, they are often granted extensive operational permissions. While this combination of increased capability and operational permissions offers transformative potential, it also raises safety concerns (OpenAI, 2024b; Anthropic, 2023; Betley et al., 2025). Researchers are worried about the emerging safety risks of LLM agents' self-replication (OpenAI, 2024a; 2025; Black et al., 2025). Prior studies on LLM self-replication risks have mainly focused on measuring the capability (verbalized success rate) of self-replication, either through direct instructions or within synthetic capability benchmarks (Pan et al., 2024; 2025; Kran et al., 2025; Black et al., 2025).

large language model, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

Oct-1-2025

arXiv.org PDF

Add feedback

Country:
- Asia > China (0.14)

Genre:
- Research Report > New Finding (1.00)

Industry:
- Media > Film (0.54)
- Leisure & Entertainment (0.54)
- Information Technology > Security & Privacy (0.34)

Technology:
- Information Technology > Artificial Intelligence
  - Natural Language > Large Language Model (1.00)
  - Machine Learning > Neural Networks
    - Deep Learning > Generative AI (0.46)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found