David, Isaac
AuthorMist: Evading AI Text Detectors with Reinforcement Learning
David, Isaac, Gervais, Arthur
In the age of powerful AI-generated text, automatic detectors have emerged to identify machine-written content. This poses a threat to author privacy and freedom, as text authored with AI assistance may be unfairly flagged. We propose AuthorMist, a novel reinforcement learning-based system to transform AI-generated text into human-like writing. AuthorMist leverages a 3-billion-parameter language model as a backbone, fine-tuned with Group Relative Policy Optimization (GPRO) to paraphrase text in a way that evades AI detectors. Our framework establishes a generic approach where external detector APIs (GPTZero, WinstonAI, Originality.ai, etc.) serve as reward functions within the reinforcement learning loop, enabling the model to systematically learn outputs that these detectors are less likely to classify as AI-generated. This API-as-reward methodology can be applied broadly to optimize text against any detector with an accessible interface. Experiments on multiple datasets and detectors demonstrate that AuthorMist effectively reduces the detectability of AI-generated text while preserving the original meaning. Our evaluation shows attack success rates ranging from 78.6% to 96.2% against individual detectors, significantly outperforming baseline paraphrasing methods. AuthorMist maintains high semantic similarity (above 0.94) with the original text while successfully evading detection. These results highlight limitations in current AI text detection technologies and raise questions about the sustainability of the detection-evasion arms race.
Dissociating Artificial Intelligence from Artificial Consciousness
Findlay, Graham, Marshall, William, Albantakis, Larissa, David, Isaac, Mayner, William GP, Koch, Christof, Tononi, Giulio
Developments in machine learning and computing power suggest that artificial general intelligence is within reach. This raises the question of artificial consciousness: if a computer were to be functionally equivalent to a human, being able to do all we do, would it experience sights, sounds, and thoughts, as we do when we are conscious? Answering this question in a principled manner can only be done on the basis of a theory of consciousness that is grounded in phenomenology and that states the necessary and sufficient conditions for any system, evolved or engineered, to support subjective experience. Here we employ Integrated Information Theory (IIT), which provides principled tools to determine whether a system is conscious, to what degree, and the content of its experience. We consider pairs of systems constituted of simple Boolean units, one of which -- a basic stored-program computer -- simulates the other with full functional equivalence. By applying the principles of IIT, we demonstrate that (i) two systems can be functionally equivalent without being phenomenally equivalent, and (ii) that this conclusion is not dependent on the simulated system's function. We further demonstrate that, according to IIT, it is possible for a digital computer to simulate our behavior, possibly even by simulating the neurons in our brain, without replicating our experience. This contrasts sharply with computational functionalism, the thesis that performing computations of the right kind is necessary and sufficient for consciousness.