Encoder-decoder multimodal speaker change detection