Establishing degrees of closeness between audio recordings along different dimensions using large-scale cross-lingual models