Token-Level Serialized Output Training for Joint Streaming ASR and ST Leveraging Textual Alignments

Open in new window