Table 2: self-playvsinteractiveeval

Neural Information Processing Systems 

Reward Quality Fluency Diversity Contingen. Reward exploitation in RL16 is a known problem and an active area of17 research (Amodei et al., 2016). Additionally,wehave21 run further experiments and provide strong empirical evidence that our proposed metrics are not easily exploitable.22 Primary (evaluation) and secondary (EI) contributions [R2, R3]: The main contribution of this work is an30 evaluation methodology that captures higher level human conversation concepts.

Similar Docs  Excel Report  more

TitleSimilaritySource
None found