Masked Audio Text Encoders are Effective Multi-Modal Rescorers