HiddenBench: Assessing Collective Reasoning in Multi-Agent LLMs via Hidden Profile Tasks

Open in new window