Reviews: Approximating Interactive Human Evaluation with Self-Play for Open-Domain Dialog Systems
–Neural Information Processing Systems
This paper explores interesting directions, in particular 1) using interactive settings to evaluate a model rather than a single answer, and 2) combining different automated metrics in a weighted sums to approximate human evaluation (e.g., based on sentiment). Reviewers have raised crucial points, regarding gameability (so that using the metrics for training a model is tricky if not followed by a non-gameable evaluation), and lack of comparability between different self-play. It's indeed a much better evaluation setting if the system does not control both sides (e.g., models being matched to the same set of fixed models), so authors should definitely follow that direction. However, I expect this work would still be interesting to the dialog community: many of the diagnostic advantages of the model-talking-to-model setting remain, in practice, especially because the model is in fact not trained with the self-play objective, but that criterion is only used post hoc (so the system can't extensively exploit it during training). In practice, a lot of the problems of the generations of a given model already show up during self-play, and the reasonable worry raised by reviewers that the model could exploit the metric remains theoretical at the moment.
Neural Information Processing Systems
Jun-2-2025, 01:09:11 GMT
- Technology: