Compositional Zero-Shot Learning for Attribute-Based Object Reference in Human-Robot Interaction