Build a custom speech-to-text model with speaker diarization capabilities