Reply to "Emergent LLM behaviors are observationally equivalent to data leakage"
Ashery, Ariel Flint, Aiello, Luca Maria, Baronchelli, Andrea
–arXiv.org Artificial Intelligence
Reply to "Emergent LLM behaviors are observationally equivalent to data leakage" Abstract A potential concern when simulating populations of large language models (LLMs) is data contamination, i.e. the possibility that training data may shape outcomes in unintended ways. While this concern is important and may hinder certain experiments with multi-agent models, it does not preclude the study of genuinely emergent dynamics in LLM populations. The recent critique by Barrie and T ornberg [1] of the results of Flint Ashery et al. [2] offers an opportunity to clarify that self-organisation and model-dependent emergent dynamics can be studied in LLM populations, highlighting how such dynamics have been empirically observed in the specific case of social conventions. Barrie & T ornberg [1] question whether the emergence of conventions observed in our recent study of interacting large language models (LLMs) [2] can be attributed to genuine collective dynamics, or instead result from data leakage from the models' training data. In this note, we respond to their main points and argue that the observed dynamics cannot be explained by data contamination alone.
arXiv.org Artificial Intelligence
Jun-24-2025
- Country:
- Europe
- Denmark > Capital Region
- Copenhagen (0.05)
- United Kingdom > England
- Cambridgeshire > Cambridge (0.05)
- Greater London > London (0.04)
- Denmark > Capital Region
- Europe
- Genre:
- Research Report > New Finding (0.69)
- Technology: