Rethinking Speech Recognition with A Multimodal Perspective via Acoustic and Semantic Cooperative Decoding

Open in new window