Audio Difference Learning for Audio Captioning