SAM2Grasp: Resolve Multi-modal Grasping via Prompt-conditioned Temporal Action Prediction