Audio Explanation Synthesis with Generative Foundation Models