I see what you mean: Co-Speech Gestures for Reference Resolution in Multimodal Dialogue