WhodunitBench: Evaluating Large Multimodal Agents via Murder Mystery Games

Open in new window