Using Whisper (speech-to-text) and Tortoise (text-to-speech)