Debate-Driven Multi-Agent LLMs for Phishing Email Detection

Nguyen, Ngoc Tuong Vy, Childress, Felix D, Yin, Yunting

arXiv.org Artificial Intelligence 

M ETHODS A. Multi-Agent Debate Framework We propose a multi-agent debate framework for phishing email detection, composed of three components: two debater agents, a pre-defined and scripted debate procedure, and a judge agent. The debater agents consist of two LLM-based instances, which may be instantiated from the same or different models. The first agent is prompted to argue that the given email is a phishing attempt, while the second agent is prompted to respond to the first agent's output by countering those claims and arguing for the email's legitimacy. The two agents then engage in another round to make sure that the arguments are well-formulated while maintaining computational efficiency. The debate procedure is pre-defined and scripted to generate template prompts for each email in the dataset: 1) Round One: Carefully analyze the following email and argue why it is likely to be a phishing attempt (Agent 1) Carefully analyze the following email and argue why it is likely to be legitimate and not a phishing attempt (Agent 2) 2) Round Two: Given your opponent's rebuttal, reinforce your position that the following email is a phishing attempt (Agent 1) Given your opponent's rebuttal, reinforce your position that the following email is not a phishing attempt (Agent 2) Arguments made by the two agents are logged for subsequent judge evaluation.