Masked Audio Text Encoders are Effective Multi-Modal Rescorers

Open in new window