We first take a theoretical approach to analyzing debate and provide a framework through which debate can be mathematically examined. Building on this framework, we provide several theoretical results for multi-agent debate.
We study minimax optimal reinforcement learning in episodic factored Markov decision processes (FMDPs), which are MDPs with conditionally independent transition components.
The machines gain possibly different utilities by processing different jobs, and alljobs assigned tothesame machine should beprocessed without overlap.