Flexible Keyword Spotting based on Homogeneous Audio-Text Embedding