Do Audio-Language Models Understand Linguistic Variations?