What does it take to get state of the art in simultaneous speech-to-speech translation?

Open in new window