Collab-Overcooked: Benchmarking and Evaluating Large Language Models as Collaborative Agents

Open in new window