WMT24 Test Suite: Gender Resolution in Speaker-Listener Dialogue Roles

Dawkins, Hillary, Nejadgholi, Isar, Lo, Chi-kiu

arXiv.org Artificial Intelligence 

We assess the difficulty of gender resolution in literary-style dialogue settings and the influence of gender stereotypes. Instances of the test suite contain spoken dialogue interleaved with external meta-context about the characters and the manner of speaking. We find that character and manner stereotypes outside of the dialogue significantly impact the gender agreement of referents within the dialogue.