SAM-E: Leveraging Visual Foundation Model with Sequence Imitation for Embodied Manipulation