Anchor-aware Deep Metric Learning for Audio-visual Retrieval