Unsupervised Structure Discovery for Semantic Analysis of Audio