Sequence-to-Sequence Multi-Modal Speech In-Painting