Contrastively Learning Visual Attention as Affordance Cues from Demonstrations for Robotic Grasping