Speaker and Language Change Detection using Wav2vec2 and Whisper
Berns, Tijn, Vaessen, Nik, van Leeuwen, David A.
–arXiv.org Artificial Intelligence
A penalty was needed to compensate for the difference in the number of parameters, but tuning the weight of this penalty was We investigate recent transformer networks pre-trained for automatic considered a weakness, that [3] cleverly circumvented by fixing speech recognition for their ability to detect speaker the number of model parameters when going from a single and language changes in speech. We do this by simply to two models. In the neural era, [4] applied an LSTM for the adding speaker (change) or language targets to the labels. For sole task of SCD, labelling individual frames with a speaker Wav2vec2 pre-trained networks, we also investigate if the representation change boolean, after convolving the single speaker change labels for the speaker change symbol can be conditioned to with a unit block function to account for class imbalance.
arXiv.org Artificial Intelligence
Feb-18-2023