Matching Latent Encoding for Audio-Text based Keyword Spotting