PingPong: A Benchmark for Role-Playing Language Models with User Emulation and Multi-Model Evaluation

Open in new window