Build a custom speech-to-text model with speaker diarization capabilities
In this code pattern, learn how to train a custom language and acoustic speech-to-text model to transcribe audio files to get speaker diarized output when given a corpus file and audio recordings of a meeting or classroom. One feature of the IBM Watson Speech to Text service is the capability to detect different speakers from the audio file, also known as speaker diarization. This code pattern shows this capability by training a custom language model with a corpus text file, which then trains the model with'Out of Vocabulary' words as well as a custom acoustic model with the audio files, which train the model with'Accent' detection in a Python Flask run time. Get detailed instructions in the README file. This code pattern is part of the Extracting insights from videos with IBM Watson use case series, which showcases the solution on extracting meaningful insights from videos using Watson Speech to Text, Watson Natural Language Processing, and Watson Tone Analyzer services.
Nov-3-2021, 05:40:30 GMT