Conditional Variational Autoencoder for Sign Language Translation with Cross-Modal Alignment