Table 2: self-playvsinteractiveeval
–Neural Information Processing Systems
Reward Quality Fluency Diversity Contingen. Reward exploitation in RL16 is a known problem and an active area of17 research (Amodei et al., 2016). Additionally,wehave21 run further experiments and provide strong empirical evidence that our proposed metrics are not easily exploitable.22 Primary (evaluation) and secondary (EI) contributions [R2, R3]: The main contribution of this work is an30 evaluation methodology that captures higher level human conversation concepts.
Neural Information Processing Systems
Feb-15-2026, 08:07:16 GMT