Rethinking Speech Recognition with A Multimodal Perspective via Acoustic and Semantic Cooperative Decoding