M2R2: MultiModal Robotic Representation for Temporal Action Segmentation