Transcription and translation of videos using fine-tuned XLSR Wav2Vec2 on custom dataset and mBART