WhodunitBench: Evaluating Large Multimodal Agents via Murder Mystery Games

Neural Information Processing Systems 

Recently, large language models (LLMs) have achieved superior performance, empowering the development of large multimodal agents (LMAs).